Modelling the interaction between binaural and temporal speech processing
Résumé
When listening to speech in reverberant conditions, listeners profit from early speech reflections because they can be integrated with direct speech sound. In contrast, late reflections are typically detrimental because they cannot be integrated with the target speech. Rennies et al. (2019) measured speech reception thresholds (SRTs) in stationary noise in 86 conditions with different numbers and delay times of speech reflections. In some conditions, different interaural phase differences (IPDs) were introduced for the noise, the direct sound and the reflections in order to enable the listeners to use binaural unmasking. By analyzing the binaural room impulse response (BRIR) while using the binaural speech intelligibility model (BSIM), Rennies et al. (2019) found that listeners used a temporal window with a length of 100 ms to integrate useful information. Speech reflections outside this window were detrimental. Interestingly, this ?useful?-window not necessarily has to be an ?early?- window, as it is not required that the direct speech sound is included. It is rather important that the window includes the maximum number of useful speech reflections. In this study we use the BSIM blindly, that means without knowledge of the BRIR and without knowledge of the clean speech signal. This is achieved by maximizing the speech-like modulations in the binaural front-end of the model, which applies an equalization cancellation (EC) model. In this way, the useful speech information is maximized and the detrimental information is minimized blindly. As this model works bottom-up it can be combined with arbitrary speech intelligibility measures, for instance, the speech intelligibility index (SII) or the speech transmission index (STI).