Hybrid coding/indexing strategy for informed source separation of linear instantaneous under-determined audio mixtures

Mathieu Parvaix 1 Laurent Girin 1 Laurent Daudet 2 Jonathan Pinel 3 Cléo Baras 3
GIPSA-DPC - Département Parole et Cognition
GIPSA-DIS - Département Images et Signal
Abstract : We present a system for under-determined source separation of non-stationary audio signals from a stereo 2-channel linear instantaneous mixture. This system is dedicated to isolate the different instruments/voices of a piece of music, so that an end-user can separately manipulate those source signals. The problem is addressed with a specific informed approach, that is implemented with a coder corresponding to the step of music production, and a separate decoder corresponding to the step of signal restitution. At the coder, source signals are assumed to be available, and are used to i) generate the stereo 2-channel mix signal, and ii) extract a small amount of distinctive features embedded into the mix signal using an inaudible watermarking technique. At the decoder, extracting and exploiting the watermark from the transmitted mix signal enables an end-user who has no direct access to the original source signals to separate these source signals from the mix signal. In the present study, we propose a new hybrid system that merges two techniques of informed source separation: a subset of the source signals are encoded using a "sources-channel coding" approach, and another subset are selected for local inversion of the mixture. The respective codes and indexes are transmitted to the decoder using a new high-capacity watermarking technique. At the decoder, the encoded source signals are decoded and then subtracted from the mixture signal, before local inversion of the remaining sub-mixture signal leads to the estimation of the second subset of source signals. This hybrid separation technique enables to efficiently combine the advantages of both coding and inversion approaches. We report experiments with 5 different source signals separated from stereo mixtures, with a remarkable quality, enabling separate manipulation during music restitution.
