Semantic browsing of sound databases without keywords

Grégoire Lafay; Nicolas Misdariis; Mathieu Lagrange; Mathias Rossignol

Article Dans Une Revue Journal of the Audio Engineering Society Année : 2016

Semantic browsing of sound databases without keywords

(1) , (2) , (1) , (3)

1
2
3

Grégoire Lafay

Fonction : Auteur correspondant
PersonId : 718
IdHAL : gregoirelafay

Connectez-vous pour contacter l'auteur

Institut de Recherche en Communications et en Cybernétique de Nantes

Nicolas Misdariis

Fonction : Auteur

Equipe Perception et design sonores

Mathieu Lagrange

Fonction : Auteur
PersonId : 4329
IdHAL : mathieu-lagrange

Institut de Recherche en Communications et en Cybernétique de Nantes

Mathias Rossignol

Fonction : Auteur

Sciences et Technologies de la Musique et du Son

Résumé

In this paper, we study the relevance of a semantic organization of sounds to ease the browsing of a sound database. For such a task, semantic access to data is traditionally implemented by a keyword selection process. However, various limitations of written language, such as word polysemy, ambiguities, or translation issues, may bias the browsing process. We present and study the efficiency of two sound presentation strategies that organize sounds spatially so as to reflect an underlying semantic hierarchy. For the sake of comparison, we also consider a display whose spatial organization is based on acoustic cues. Those three displays are evaluated in terms of search speed in a crowdsourcing experiment. Results demonstrate the usefulness of using an implicit semantic organization to display sounds, both in terms of search speed and of efficiency of learning. Audio content management and display, Semantic sound data mining 0 Introduction With the growing capability of recording and storage devices, the problem of indexing large databases of audio has recently been the object of much attention [17]. Most of that effort is dedicated to the automatic inference of indexing metadata from the actual audio recording [18, 16]; in contrast, the ability to browse such databases in an effective manner has been less considered. Most media assets management are based on keyword-driven queries. The user enters a word which best characterizes the desired item, and the interface presents him with items related to this word. The effectiveness of this principle is primarily based on the typological structure and nomenclature of the database. However, for databases and more specifically for databases of sounds, several issues arise: 1. Sounds, as many others things, can be described in many ways. Sound may be designated by their sources (a car door), as well as by the action of those sources (the slamming of a car door) or their environments (slamming a car door in a garage) [7, 10, 2]. Designing an effective keyword-based search system requires an accurate description of each sound, which has to be tunable to the sound representation of each user. 2. Pre-defined verbal descriptions of the sounds made available to the users may potentially bias their browsing and final selection. 3. Localization of the query interface is made difficult as the translation of some words referring to qualitative aspects of the sound, such as its ambience, is notoriously ambiguous and subject to cultural specificities. 4. Unless considerable time and resources are invested into developing a multilingual interface, any system based on verbal descriptions can only be used with reduced performance by non-native speakers of the chosen language.

Domaines

Machine Learning [stat.ML]

Fichier principal

lafaySsf2016.pdf (373.82 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mathieu Lagrange : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01300399

Soumis le : mercredi 16 novembre 2016-11:21:50

Dernière modification le : vendredi 5 janvier 2024-03:25:42

Archivage à long terme le : jeudi 16 mars 2017-17:23:21

Dates et versions

hal-01300399 , version 1 (10-04-2016)

hal-01300399 , version 2 (16-11-2016)

Identifiants

HAL Id : hal-01300399 , version 2

Citer

Grégoire Lafay, Nicolas Misdariis, Mathieu Lagrange, Mathias Rossignol. Semantic browsing of sound databases without keywords. Journal of the Audio Engineering Society, 2016, 64 (9), pp.628-635. ⟨hal-01300399v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES UPMC CNRS EC-NANTES IRCCYN IRCCYN-ADTSI IRCAM UNAM STMS LS2N SORBONNE-UNIVERSITE SU-SCIENCES ANR NANTES-UNIVERSITE

489 Consultations

264 Téléchargements

Semantic browsing of sound databases without keywords

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager