Skip to Main content Skip to Navigation
Conference papers

Tracking Sound Sources for Object-based Spatial Audio in 3D Audio-visual Production

Abstract : In immersive and interactive audio-visual content, there is very significant scope for spatial misalignment between the two main modalities. So, in productions that have both 3D video and spatial audio, the positioning of sound sources relative to the visual display requires careful attention. This may be achieved in the form of object-based audio, moreover allowing the producer to maintain control over individual elements within the mix. Yet each object?s metadata is needed to define its position over time. In the present study, audio-visual studio recordings were made of short scenes representing three genres: drama, sport and music. Foreground video was captured by a light-field camera array, which incorporated a microphone array, alongside more conventional sound recording by spot microphones and a first-order ambisonic room microphone. In the music scenes, a direct feed from the guitar pickup was also recorded. Video data was analysed to form a 3D reconstruction of the scenes, and human figure detection was applied to the 2D frames of the central camera. Visual estimates of the sound source positions were used to provide ground truth. Position metadata were encoded within audio definition model (ADM) format audio files, suitable for standard object-based rendering. The steered response power of the acoustical signals at the microphone array were used, with the phase transform (SRP-PHAT), to determine the dominant source position(s) at any time, and given as input to a Sequential Monte Carlo Probability Hypothesis Density (SMC-PHD) tracker. The output of this was evaluated in relation to the ground truth. Results indicate a hierarchy of accuracy in azimuth, elevation and range, in accordance with human spatial auditory perception. Azimuth errors were within the tolerance bounds reported by studies of the Ventriloquism Effect, giving an initial promising indication that such an approach may open the door to object-based production for live events.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03235364
Contributor : Claude Inserra <>
Submitted on : Thursday, May 27, 2021 - 9:32:51 AM
Last modification on : Monday, June 7, 2021 - 6:03:18 PM
Long-term archiving on: : Saturday, August 28, 2021 - 6:16:31 PM

File

000884.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Mohd Azri Mohd Izhar, Marco Volino, Adrian Hilton, Philip Jackson. Tracking Sound Sources for Object-based Spatial Audio in 3D Audio-visual Production. Forum Acusticum, Dec 2020, Lyon, France. pp.2051-2058, ⟨10.48465/fa.2020.0884⟩. ⟨hal-03235364⟩

Share

Metrics

Record views

30

Files downloads

29