Semantic browsing of sound databases without keywords

Abstract : In this paper, we study the relevance of a semantic organization of sounds to ease the browsing of a sound database. For such a task, semantic access to data is traditionally implemented by a keyword selection process. However, various limitations of written language, such as word polysemy, ambiguities, or translation issues, may bias the browsing process. We present and study the efficiency of two sound presentation strategies that organize sounds spatially so as to reflect an underlying semantic hierarchy. For the sake of comparison, we also consider a display whose spatial organization is based on acoustic cues. Those three displays are evaluated in terms of search speed in a crowdsourcing experiment. Results demonstrate the usefulness of using an implicit semantic organization to display sounds, both in terms of search speed and of efficiency of learning. Audio content management and display, Semantic sound data mining 0 Introduction With the growing capability of recording and storage devices, the problem of indexing large databases of audio has recently been the object of much attention [17]. Most of that effort is dedicated to the automatic inference of indexing metadata from the actual audio recording [18, 16]; in contrast, the ability to browse such databases in an effective manner has been less considered. Most media assets management are based on keyword-driven queries. The user enters a word which best characterizes the desired item, and the interface presents him with items related to this word. The effectiveness of this principle is primarily based on the typological structure and nomenclature of the database. However, for databases and more specifically for databases of sounds, several issues arise: 1. Sounds, as many others things, can be described in many ways. Sound may be designated by their sources (a car door), as well as by the action of those sources (the slamming of a car door) or their environments (slamming a car door in a garage) [7, 10, 2]. Designing an effective keyword-based search system requires an accurate description of each sound, which has to be tunable to the sound representation of each user. 2. Pre-defined verbal descriptions of the sounds made available to the users may potentially bias their browsing and final selection. 3. Localization of the query interface is made difficult as the translation of some words referring to qualitative aspects of the sound, such as its ambience, is notoriously ambiguous and subject to cultural specificities. 4. Unless considerable time and resources are invested into developing a multilingual interface, any system based on verbal descriptions can only be used with reduced performance by non-native speakers of the chosen language.
Document type :
Journal articles
Liste complète des métadonnées

Cited literature [4 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01300399
Contributor : Mathieu Lagrange <>
Submitted on : Wednesday, November 16, 2016 - 11:21:50 AM
Last modification on : Thursday, February 7, 2019 - 3:13:22 PM
Document(s) archivé(s) le : Thursday, March 16, 2017 - 5:23:21 PM

File

lafaySsf2016.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01300399, version 2

Citation

Grégoire Lafay, Nicolas Misdariis, Mathieu Lagrange, Mathias Rossignol. Semantic browsing of sound databases without keywords. Journal of the Audio Engineering Society, Audio Engineering Society, 2016, 64 (9), pp.628-635. ⟨hal-01300399v2⟩

Share

Metrics

Record views

443

Files downloads

101