Semantic browsing of sound databases without keywords

Abstract : In this paper, we study the relevance of a semantic organization of sounds to ease the browsing of a sound database. For such a task, semantic access to data is traditionally implemented by a keyword selection process. However, various limitations of written language, such as word polysemy, ambiguities, or translation issues, may bias the browsing process. We present and study the efficiency of two sound presentation strategies that organize sounds spatially so as to reflect an underlying semantic hierarchy. For the sake of comparison, we also consider a display whose spatial organization is based on acoustic cues. Those three displays are evaluated in terms of search speed in a crowdsourcing experiment. Results demonstrate the usefulness of using an implicit semantic organization to display sounds, both in terms of search speed and of efficiency of learning. Audio content management and display, Semantic sound data mining 0 Introduction With the growing capability of recording and storage devices, the problem of indexing large databases of audio has recently been the object of much attention [17]. Most of that effort is dedicated to the automatic inference of indexing metadata from the actual audio recording [18, 16]; in contrast, the ability to browse such databases in an effective manner has been less considered. Most media assets management are based on keyword-driven queries. The user enters a word which best characterizes the desired item, and the interface presents him with items related to this word. The effectiveness of this principle is primarily based on the typological structure and nomenclature of the database. However, for databases and more specifically for databases of sounds, several issues arise: 1. Sounds, as many others things, can be described in many ways. Sound may be designated by their sources (a car door), as well as by the action of those sources (the slamming of a car door) or their environments (slamming a car door in a garage) [7, 10, 2]. Designing an effective keyword-based search system requires an accurate description of each sound, which has to be tunable to the sound representation of each user. 2. Pre-defined verbal descriptions of the sounds made available to the users may potentially bias their browsing and final selection. 3. Localization of the query interface is made difficult as the translation of some words referring to qualitative aspects of the sound, such as its ambience, is notoriously ambiguous and subject to cultural specificities. 4. Unless considerable time and resources are invested into developing a multilingual interface, any system based on verbal descriptions can only be used with reduced performance by non-native speakers of the chosen language.
Type de document :
Article dans une revue
Journal of the Audio Engineering Society, Audio Engineering Society, 2016, 64 (9), pp.628-635
Liste complète des métadonnées

Littérature citée [4 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01300399
Contributeur : Mathieu Lagrange <>
Soumis le : mercredi 16 novembre 2016 - 11:21:50
Dernière modification le : jeudi 7 février 2019 - 15:13:22
Document(s) archivé(s) le : jeudi 16 mars 2017 - 17:23:21

Fichier

lafaySsf2016.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01300399, version 2

Citation

Grégoire Lafay, Nicolas Misdariis, Mathieu Lagrange, Mathias Rossignol. Semantic browsing of sound databases without keywords. Journal of the Audio Engineering Society, Audio Engineering Society, 2016, 64 (9), pp.628-635. 〈hal-01300399v2〉

Partager

Métriques

Consultations de la notice

432

Téléchargements de fichiers

80