Microphone array postfilter for separation of simultaneous non-stationary sources, Proc. ICASSP, 2004. ,
System for robust 3D speaker tracking using microphone array measurements, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), pp.2117-2122, 2004. ,
DOI : 10.1109/IROS.2004.1389722
Particle filtering algorithms for tracking an acoustic source in a reverberant environment, IEEE Transactions on Speech and Audio Processing, vol.11, issue.6, pp.826-836, 2003. ,
DOI : 10.1109/TSA.2003.818112
Localization of simultaneous moving sound sources for mobile robot using a frequency- domain steered beamformer approach, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004, pp.1033-1038, 2004. ,
DOI : 10.1109/ROBOT.2004.1307286
Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.6, pp.1109-1121, 1984. ,
DOI : 10.1109/TASSP.1984.1164453
Speech enhancement for non-stationary noise environments, Signal Processing, vol.81, issue.11, pp.2403-2418, 2001. ,
DOI : 10.1016/S0165-1684(01)00128-1
Echo avoidance in a computational model of the precedence effect, Speech Communication, vol.27, pp.3-4, 1999. ,
Active speech source localization by a dual coarse-to-fine search, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), pp.3309-3312, 2001. ,
DOI : 10.1109/ICASSP.2001.940366
URL : http://www.umiacs.umd.edu/~dz/pbpslist/icassp01bf.pdf
An application of a particle filter to bayesian multiple sound source tracking with audio and video information fusion, Proc. Fusion, pp.805-812, 2004. ,
On sequential Monte Carlo sampling methods for bayesian filtering, Statistics and Computing, vol.10, issue.3, pp.197-208, 2000. ,
DOI : 10.1023/A:1008935410038
Sound event detection in multichannel audio using spatial and harmonic features, Proc IEEE AASP Chall Detect Classif Acoust Scenes Events, 2016. ,
Ibm research trecvid-2003 video retrieval system, p.2003, 2003. ,
Deep canonical correlation analysis, Proc Int Conf Mach Learn, 2013. ,
Efficient Source Localization and Tracking in Reverberant Environments Using Microphone Arrays, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., p.1061, 2005. ,
DOI : 10.1109/ICASSP.2005.1416195
Tracking Multiple Acoustic Sources in Reverberant Environments using Regularized Particle Filter, 2007 15th International Conference on Digital Signal Processing, pp.99-102, 2007. ,
DOI : 10.1109/ICDSP.2007.4288528
Using Steady-State Suppression to Improve Speech Intelligibility in Reverberant Environments for Elderly Listeners, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.1775-1780, 2010. ,
DOI : 10.1109/TASL.2010.2052165
Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models, Pattern Anal Appl, vol.12, issue.3, pp.271-284, 2008. ,
A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Transactions on Signal Processing, vol.50, issue.2, pp.174-188, 2002. ,
DOI : 10.1109/78.978374
An application of a particle filter to bayesian multiple sound source tracking with audio and video information fusion, Proc Fusion, pp.805-812, 2004. ,
Multimodal fusion for multimedia analysis: a survey, Multimedia Systems, vol.24, issue.11, pp.345-379, 2010. ,
DOI : 10.1115/1.3662552
URL : http://www.comp.nus.edu.sg/%7Emohan/papers/fusion_survey.pdf
Harmony in Motion, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007. ,
DOI : 10.1109/CVPR.2007.383344
Exact and Approximate Solutions of Source Localization Problems, IEEE Transactions on Signal Processing, vol.56, issue.5, pp.1770-1778, 2008. ,
DOI : 10.1109/TSP.2007.909342
Neural network combining classifier based on Dempster-Shafer theory for semantic indexing in video content Adv Multimed Model pp, pp.196-205, 2006. ,
Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.3, pp.538-549, 2010. ,
DOI : 10.1109/TASL.2010.2041381
URL : https://hal.archives-ouvertes.fr/inria-00557088
Temporal kernel CCA and its application in multimodal neuronal data analysis, Machine Learning, vol.79, issue.1-2, pp.5-27, 2010. ,
DOI : 10.1017/CBO9780511809682
Superdirective Microphone Arrays, pp.19-38, 2001. ,
DOI : 10.1007/978-3-662-04619-7_2
Theoretical noise reduction limits of the generalized sidelobe canceller (GSC) for speech enhancement, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), pp.2965-2968, 1999. ,
DOI : 10.1109/ICASSP.1999.761385
Multi-source TDOA estimation in reverberant audio using angular spectra and clustering, Signal Processing, vol.92, issue.8, pp.1950-1960, 2012. ,
DOI : 10.1016/j.sigpro.2011.09.032
URL : https://hal.archives-ouvertes.fr/inria-00576297
Underdetermined blind source separation using sparse representations, Signal Processing, vol.81, issue.11, pp.2353-2362, 2001. ,
DOI : 10.1016/S0165-1684(01)00120-7
URL : http://iew3.technion.ac.il/~mcib/undetermICA.pdf
Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition, Face and Gesture 2011, pp.746-752, 2011. ,
DOI : 10.1109/FG.2011.5771341
Measuring audio and visual speech synchrony: methods and applications, IET International Conference on Visual Information Engineering (VIE 2006), pp.255-260, 2006. ,
DOI : 10.1049/cp:20060538
URL : http://ieeexplore.ieee.org/iel5/4286642/4286643/04286698.pdf
Localization of multiple speakers based on a two step acoustic map analysis, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4349-4352, 2008. ,
DOI : 10.1109/ICASSP.2008.4518618
Acoustic Source Localization With Distributed Asynchronous Microphone Networks, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.2, pp.439-443, 2013. ,
DOI : 10.1109/TASL.2012.2215601
A Robust and Low-Complexity Source Localization Algorithm for Asynchronous Distributed Microphone Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.10, pp.1563-1575, 2015. ,
DOI : 10.1109/TASLP.2015.2439040
High-resolution frequency-wavenumber spectrum analysis, Proceedings of the IEEE, vol.57, issue.8, pp.1408-1418, 1969. ,
DOI : 10.1109/PROC.1969.7278
High-resolution frequency-wavenumber spectrum analysis, Proceedings of the IEEE, vol.57, issue.8, pp.1408-1418, 1969. ,
DOI : 10.1109/PROC.1969.7278
Coherence and time delay estimation, Proceedings of the IEEE, vol.75, issue.2, pp.236-255, 1987. ,
DOI : 10.1109/PROC.1987.13723
Blind Audiovisual Source Separation Based on Sparse Redundant Representations, IEEE Transactions on Multimedia, vol.12, issue.5, pp.358-371, 2010. ,
DOI : 10.1109/TMM.2010.2050650
URL : https://hal.archives-ouvertes.fr/inria-00541412
Nonlinear video diffusion based on audio-video synchrony, IEEE Trans Multimed, 2010. ,
Large-scale multimodal semantic concept detection for consumer video, Proceedings of the international workshop on Workshop on multimedia information retrieval , MIR '07, pp.255-264, 2007. ,
DOI : 10.1145/1290082.1290118
URL : http://www.ee.columbia.edu/dvmm/publications/07/mir2007-kkalg.pdf
Integrated person identification using voice and facial features, IEE Colloquium on Image Processing for Security Applications, pp.1-4, 1997. ,
DOI : 10.1049/ic:19970380
Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection, Object recognition supported by user interaction for service robots, pp.789-794, 2002. ,
DOI : 10.1109/ICPR.2002.1048137
URL : http://www.media.mit.edu/~tanzeem/tanzeem_icpr02.pdf
Nonnegative Matrix and Tensor Factorization, IEEE Signal Process Mag, vol.25, issue.1, pp.142-145, 2008. ,
DOI : 10.1002/9780470747278
A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling, IEEE Signal Processing Letters, vol.18, issue.1, pp.71-74, 2011. ,
DOI : 10.1109/LSP.2010.2091502
Localization of Acoustic Sources Through the Fitting of Propagation Cones Using Multiple Independent Arrays, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.7, pp.1964-1975, 2012. ,
DOI : 10.1109/TASL.2012.2191958
Practical supergain, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.34, issue.3, pp.393-398, 1986. ,
DOI : 10.1109/TASSP.1986.1164847
Audio-Visual Event Recognition in Surveillance Video Sequences, IEEE Transactions on Multimedia, vol.9, issue.2, pp.257-267, 2007. ,
DOI : 10.1109/TMM.2006.886263
A Bilinear Approach to the Position Self-Calibration of Multiple Sensors, IEEE Transactions on Signal Processing, vol.60, issue.2, pp.660-673, 2012. ,
DOI : 10.1109/TSP.2011.2175387
Look who's talking: speaker detection using video and audio correlation, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), pp.1589-1592, 2000. ,
DOI : 10.1109/ICME.2000.871073
Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.886-893, 2005. ,
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512
Look who's talking: Detecting the dominant speaker in a cluttered scenario, Proc IEEE Int Conf Acoust Speech Signal Process, 2014. ,
Robust Localization in Reverberant Rooms, Microphone Arrays, pp.157-180, 2001. ,
DOI : 10.1007/978-3-662-04619-7_8
A Generalized Steered Response Power Method for Computationally Viable Source Localization, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.8, pp.2510-2526, 2007. ,
DOI : 10.1109/TASL.2007.906694
A Real-Time SRP-PHAT Source Location Implementation using Stochastic Region Contraction(SRC) on a Large-Aperture Microphone Array, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, pp.121-124, 2007. ,
DOI : 10.1109/ICASSP.2007.366631
GSVD-based optimal filtering for single and multimicrophone speech enhancement, IEEE Transactions on Signal Processing, vol.50, issue.9, pp.2230-2244, 2002. ,
DOI : 10.1109/TSP.2002.801937
URL : ftp://ftp.esat.kuleuven.ac.be/pub/SISTA/doclo/reports/01-30.ps.gz
Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.1830-1840, 2010. ,
DOI : 10.1109/TASL.2010.2050716
URL : https://hal.archives-ouvertes.fr/inria-00541865
Spatial location priors for Gaussian model based reverberant audio source separation, EURASIP Journal on Advances in Signal Processing, vol.92, issue.4, pp.1-11, 2013. ,
DOI : 10.1007/978-3-642-15995-4_8
URL : https://hal.archives-ouvertes.fr/hal-00870191
Spatial Coherence Functions for Differential Microphones in Isotropic Noise Fields, Microphone Arrays: Signal Processing Techniques and Applications, pp.61-85, 2001. ,
DOI : 10.1007/978-3-662-04619-7_4
Convolutional two-stream network fusion for video action recognition. arXiv preprint arXiv, pp.1604-06573, 2016. ,
DOI : 10.1109/cvpr.2016.213
Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., pp.78-81, 2005. ,
DOI : 10.1109/ASPAA.2005.1540173
Learning Joint Statistical Models for Audio-Visual Fusion and Segregation, Proc Adv Neural Inf Process Syst, Ml, pp.772-778, 2001. ,
Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation, Computational Intelligence and Neuroscience, vol.2008, 2008. ,
DOI : 10.1109/TSA.2005.858005
Using tensor factorisation models to separate drums from polyphonic music, Proc Int Conf Digit Audio Eff, 2009. ,
A Dempster-Shafer Based Fusion Approach for Audio-Visual Speech Recognition with Application to Large Vocabulary French Speech, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006. ,
DOI : 10.1109/ICASSP.2006.1660091
An algorithm for linearly constrained adaptive array processing, Proceedings of the IEEE, vol.60, issue.8, pp.926-935, 1972. ,
DOI : 10.1109/PROC.1972.8817
GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion, 2016. ,
DOI : 10.1109/ICCV.2015.512
Kalman filters for audio-video source localization, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., pp.118-121, 2005. ,
DOI : 10.1109/ASPAA.2005.1540183
URL : http://www.gehrignet.de/media/pdf/waspaa-October2005.pdf
Statistical Analysis of the Relationship between Audio and Video Speech Parameters for Australian English, Proc ISCA Tutor Res Workshop Audit-Vis Speech Process, pp.133-138, 2003. ,
DBN based multi-stream models for audio-visual speech recognition, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. ,
DOI : 10.1109/ICASSP.2004.1326155
URL : http://ssli.ee.washington.edu/people/bilmes/mypapers/dbn_icassp04.pdf
Asynchrony modeling for audio-visual speech recognition, Proceedings of the second international conference on Human Language Technology Research -, pp.1-6, 2002. ,
DOI : 10.3115/1289189.1289244
URL : http://www.research.ibm.com/AVSTG/HLT02_ASYNCHRONY.pdf
Sparse component analysis, pp.367-420, 2010. ,
DOI : 10.1016/B978-0-12-374726-6.00015-1
URL : https://hal.archives-ouvertes.fr/inria-00541853
An alternative approach to linearly constrained adaptive beamforming, IEEE Transactions on Antennas and Propagation, vol.30, issue.1, pp.27-34, 1982. ,
DOI : 10.1109/TAP.1982.1142739
Source localization in reverberant environments: modeling and statistical analysis, IEEE Transactions on Speech and Audio Processing, vol.11, issue.6, pp.791-803, 2003. ,
DOI : 10.1109/TSA.2003.818027
URL : http://www.itr-rescue.org/pubs/upload/335_Gustafsson,2005.pdf
Canonical Correlation Analysis: An Overview with Application to Learning Methods, Neural Computation, vol.10, issue.12, pp.2639-2664, 2004. ,
DOI : 10.1093/biomet/58.3.433
URL : http://eprints.ecs.soton.ac.uk/9225/01/tech_report03.pdf
Adaptive Filter Theory, 5 edn, 2014. ,
Array signal processing, 1985. ,
Relations between two sets of variates, Biometrika, vol.2834, pp.321-377, 1936. ,
Temporal Multimodal Learning in Audiovisual Speech Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
DOI : 10.1109/CVPR.2016.389
Improving acoustic event detection using generalizable visual features and multi-modality modeling, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.349-352, 2011. ,
DOI : 10.1109/ICASSP.2011.5946412
URL : http://www.ifp.illinois.edu/speech/pubs/2011/huang11icassp.pdf
Real-time passive source localization: a practical linear-correction least-squares approach, IEEE Transactions on Speech and Audio Processing, vol.9, issue.8, pp.943-956, 2001. ,
DOI : 10.1109/89.966097
Error weighted classifier combination for multi-modal human identification, 2005. ,
Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects, IEEE Transactions on Multimedia, vol.15, issue.2, pp.378-390, 2013. ,
DOI : 10.1109/TMM.2012.2228476
Sparseness-Based 2CH BSS using the EM Algorithm in Reverberant Environment, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.147-150, 2007. ,
DOI : 10.1109/ASPAA.2007.4393015
Fusion Methods for Speech Enhancement and Audio Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.7, pp.1266-1279, 2016. ,
DOI : 10.1109/TASLP.2016.2553441
URL : https://hal.archives-ouvertes.fr/hal-01120685
Discovering joint audio???visual codewords for video event detection, Machine Vision and Applications, vol.9, issue.1, pp.33-47, 2014. ,
DOI : 10.1145/2324796.2324843
Short-term audiovisual atoms for generic video concept classification, Proc ACM Int Conf Multimed, pp.5-14, 2009. ,
DOI : 10.1145/1631272.1631277
URL : http://labrosa.ee.columbia.edu/~dpwe/pubs/JiangCCEL09-ST-AVA.pdf
Audio-visual grouplet, Proceedings of the 19th ACM international conference on Multimedia, MM '11, pp.123-132, 2011. ,
DOI : 10.1145/2072298.2072316
High-level event recognition in unconstrained videos, International Journal of Multimedia Information Retrieval, vol.73, issue.2, pp.73-101, 2013. ,
DOI : 10.1007/s11263-006-9794-4
Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching, Proc NIST TRECVID-2003, 2003. ,
Temporal Integration for Audio Classification With Application to Musical Instrument Classification, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, issue.1, 2008. ,
DOI : 10.1109/TASL.2008.2007613
URL : http://perso.telecom-paristech.fr/~grichard/Publications/TSALP_joder08.pdf
Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), pp.2985-2988, 2000. ,
DOI : 10.1109/ICASSP.2000.861162
Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1725-1732, 2014. ,
DOI : 10.1109/CVPR.2014.223
URL : http://www.cs.cmu.edu/~rahuls/pub/cvpr2014-deepvideo-rahuls.pdf
Feature discovery under contextual supervision using mutual information, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks, pp.79-84, 1992. ,
DOI : 10.1109/IJCNN.1992.227286
Pixels that Sound, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.88-95, 2005. ,
DOI : 10.1109/CVPR.2005.274
HMM based structuring of tennis videos using visual and audio cues, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), pp.309-312, 2003. ,
DOI : 10.1109/ICME.2003.1221310
On combining classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, issue.3, pp.226-239, 1998. ,
DOI : 10.1109/34.667881
Kalman Filters for Time Delay of Arrival-Based Source Localization, EURASIP Journal on Advances in Signal Processing, vol.11, issue.3, pp.1-15, 2006. ,
DOI : 10.1155/ASP/2006/12378
The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, pp.320-327, 1976. ,
DOI : 10.1109/TASSP.1976.1162830
Tensor Decompositions and Applications, SIAM Review, vol.51, issue.3, pp.455-500, 2009. ,
DOI : 10.1137/07070111X
ImageNet classification with deep convolutional neural networks, Proc Adv Neural Inf Process Syst, pp.1097-1105, 2012. ,
DOI : 10.1162/neco.2009.10-08-881
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf
Model for the interaural time differences in the azimuthal plane, The Journal of the Acoustical Society of America, vol.62, issue.1, pp.157-167, 1977. ,
DOI : 10.1121/1.381498
KERNEL AND NONLINEAR CANONICAL CORRELATION ANALYSIS, International Journal of Neural Systems, vol.11, issue.2, pp.365-378, 2000. ,
DOI : 10.1162/089976698300017467
Multiple-Hypothesis Extended Particle Filter for Acoustic Source Localization in Reverberant Environments, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.6, pp.1540-1555, 2011. ,
DOI : 10.1109/TASL.2010.2093517
Multimedia content processing through cross-modal association, Proceedings of the eleventh ACM international conference on Multimedia , MULTIMEDIA '03, 2003. ,
DOI : 10.1145/957013.957143
Audio-visual musical instrument recognition, 2011. ,
Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking, IEEE Transactions on Signal Processing, vol.61, issue.22, pp.61-5520, 2013. ,
DOI : 10.1109/TSP.2013.2277834
An overview of informed audio source separation, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pp.1-4, 2013. ,
DOI : 10.1109/WIAMIS.2013.6616139
URL : https://hal.archives-ouvertes.fr/hal-00958661
Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004. ,
DOI : 10.1023/B:VISI.0000029664.99615.94
URL : http://www.cs.ubc.ca/~lowe/papers/ijcv03.ps
Anomaly detection in crowded scenes, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.250, 2010. ,
DOI : 10.1109/CVPR.2010.5539872
URL : http://www.svcl.ucsd.edu/publications/conference/2010/cvpr2010/anomaly.pdf
Blind Speech Separation, 2007. ,
DOI : 10.1007/978-1-4020-6479-1
Evaluating Source Separation Algorithms With Reverberant Speech, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.1872-1883, 2010. ,
DOI : 10.1109/TASL.2010.2052252
EM Localization and Separation using Interaural Level and Phase Cues, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.275-278, 2007. ,
DOI : 10.1109/ASPAA.2007.4392987
URL : http://www.ee.columbia.edu/ln/labrosa/proceeds/waspaa/2007/paper/0026.pdf
Cross-Modal Integration for Performance Improving in Multimedia: A Review, Multimodal processing and interaction, pp.1-46, 2008. ,
DOI : 10.1007/978-0-387-76316-3_1
A steered response power iterative method for high-accuracy acoustic source localization, The Journal of the Acoustical Society of America, vol.134, issue.4, pp.2627-2630, 2013. ,
DOI : 10.1121/1.4820885
Decision level combination of multiple modalities for recognition and analysis of emotional expression, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2462-2465, 2010. ,
DOI : 10.1109/ICASSP.2010.5494890
An overview on video forensics, APSIPA Transactions on Signal and Information Processing, vol.5284, p.2, 2012. ,
DOI : 10.1109/TIP.2009.2028251
URL : https://doi.org/10.1017/atsip.2012.2
Learning Multimodal Dictionaries, IEEE Transactions on Image Processing, vol.16, issue.9, pp.2272-2283, 2007. ,
DOI : 10.1109/TIP.2007.901813
URL : https://hal.archives-ouvertes.fr/inria-00544772
Audiovisual Gestalts, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06), pp.200-200, 2006. ,
DOI : 10.1109/CVPRW.2006.34
Learning Bimodal Structure in Audio???Visual Data, IEEE Transactions on Neural Networks, vol.20, issue.12, pp.1898-1910, 2009. ,
DOI : 10.1109/TNN.2009.2032182
URL : https://infoscience.epfl.ch/record/125304/files/IEEETNN_final.pdf
Introduction to the psychology of hearing, 1977. ,
Dynamic Bayesian Networks: Representation, Inference and Learning, 2002. ,
Audio-visual event detection using duration dependent input output Markov models, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001), pp.39-43, 2001. ,
DOI : 10.1109/IVL.2001.990854
A coupled {HMM} for audiovisual speech recognition, Proc IEEE Int Conf Acoust Speech Signal Process, 2002. ,
DOI : 10.1109/icassp.2002.1006167
URL : http://www.cs.ubc.ca/~murphyk/Papers/icassp02.pdf
Multimodal deep learning, Proc Int Conf Mach Learn, pp.689-696, 2011. ,
Query-adaptive late fusion with neural network for instance search, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), pp.1-6, 2015. ,
DOI : 10.1109/MMSP.2015.7340795
Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.3, pp.727-739, 2014. ,
DOI : 10.1109/TASLP.2014.2303576
Acoustic event localization using a crosspower-spectrum phase based technique, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing, 1994. ,
DOI : 10.1109/ICASSP.1994.389667
Bayesian Nonparametrics for Microphone Array Processing, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.2, pp.493-504, 2014. ,
DOI : 10.1109/TASLP.2013.2294582
Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.3, pp.550-563, 2010. ,
DOI : 10.1109/TASL.2009.2031510
Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011. ,
DOI : 10.1109/ICASSP.2011.5946389
URL : https://hal.archives-ouvertes.fr/inria-00564851
A General Flexible Framework for the Handling of Prior Information in Audio Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1118-1133, 2012. ,
DOI : 10.1109/TASL.2011.2172425
URL : https://hal.archives-ouvertes.fr/hal-00626962
Motion informed audio source separation, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017. ,
DOI : 10.1109/ICASSP.2017.7951787
URL : https://hal.archives-ouvertes.fr/hal-01447977
Particle swarm localization of acoustic sources in the presence of reverberation, 2006 IEEE International Symposium on Circuits and Systems, p.4, 2006. ,
DOI : 10.1109/ISCAS.2006.1693689
Convolutive blind separation of non-stationary sources, IEEE Transactions on Speech and Audio Processing, vol.8, issue.3, pp.320-327, 2000. ,
DOI : 10.1109/89.841214
Closed-form self-localization of asynchronous microphone arrays, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays, pp.139-144, 2011. ,
DOI : 10.1109/HSCMA.2011.5942380
Spectral analysis of signals, NJ, 2005. ,
Vision of the unseen, ACM Computing Surveys, vol.43, issue.4, p.26, 2011. ,
DOI : 10.1145/1978802.1978805
Ensemble-based classifiers, Artificial Intelligence Review, vol.13, issue.4, 2010. ,
DOI : 10.1142/5686
ESPRIT-estimation of signal parameters via rotational invariance techniques, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.37, issue.7, pp.984-995, 1989. ,
DOI : 10.1109/29.32276
Event detection in field sports video using audio-visual features and a support vector Machine, IEEE Transactions on Circuits and Systems for Video Technology, vol.15, issue.10, pp.1225-1233, 2005. ,
DOI : 10.1109/TCSVT.2005.854237
A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation, IEEE Transactions on Speech and Audio Processing, vol.12, issue.5, pp.530-538, 2004. ,
DOI : 10.1109/TSA.2004.832994
Passive source localization employing intersecting spherical surfaces from time-of-arrival differences, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.35, issue.8, pp.1223-1225, 1987. ,
DOI : 10.1109/TASSP.1987.1165266
Disambiguation of TDOA Estimation for Multiple Sources in Reverberant Environments, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.8, pp.1479-1489, 2008. ,
DOI : 10.1109/TASL.2008.2004533
Multiple emitter location and signal parameter estimation, IEEE Transactions on Antennas and Propagation, vol.34, issue.3, pp.276-280, 1986. ,
DOI : 10.1109/TAP.1986.1143830
Two multimodal approaches for single microphone source separation, 2016 24th European Signal Processing Conference (EUSIPCO), 2016. ,
DOI : 10.1109/EUSIPCO.2016.7760220
URL : https://hal.archives-ouvertes.fr/hal-01400542
Soft nonnegative matrix co-factorization with application to multimodal speaker diarization, Proc IEEE Int Conf Acoust Speech Signal Process, 2013. ,
DOI : 10.1109/icassp.2013.6638316
Soft Nonnegative Matrix Co-Factorization, IEEE Transactions on Signal Processing, vol.62, issue.22, p.99, 2014. ,
DOI : 10.1109/TSP.2014.2360141
URL : https://hal.archives-ouvertes.fr/hal-01116863
Machine listening techniques as a complement to video image analysis in forensics, 2016 IEEE International Conference on Image Processing (ICIP), pp.948-952, 2016. ,
DOI : 10.1109/ICIP.2016.7532497
URL : https://hal.archives-ouvertes.fr/hal-01393959
Low-rank Approximation Based Multichannel Wiener Filter Algorithms for Noise Reduction with Application in Cochlear Implants, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.4, pp.785-799, 2014. ,
DOI : 10.1109/TASLP.2014.2304240
URL : https://hal.archives-ouvertes.fr/hal-01390918
Acoustic location of gunshots using combined angle of arrival and time of arrival measurements, p.589, 2009. ,
Nonnegative CCA for Audiovisual Source Separation, 2007 IEEE Workshop on Machine Learning for Signal Processing, pp.253-258, 2007. ,
DOI : 10.1109/MLSP.2007.4414315
URL : http://hci.iwr.uni-heidelberg.de/people/bommer/papers/0_nonnegative_cca.pdf
Audio Visual Independent Components, Proc Int Symp Indep Compon Anal Blind Signal Sep, pp.709-714, 2003. ,
Multimodal learning with deep boltzmann machines, Proc Adv Neural Inf Process Syst, pp.2222-2230, 2012. ,
Joint audio-video object localization and tracking, IEEE Signal Processing Magazine, vol.18, issue.1, pp.22-31, 2001. ,
DOI : 10.1109/79.911196
Distributed Kalman filter-based speaker tracking in microphone array networks, Applied Acoustics, vol.89, pp.71-77, 2015. ,
DOI : 10.1016/j.apacoust.2014.09.004
Multichannel semi-blind source separation via local Gaussian modeling for acoustic echo reduction, Proc Eur Signal Process Conf, 2011. ,
Simultaneous Optimization of Acoustic Echo Reduction, Speech Dereverberation, and Noise Reduction against Mutual Interference, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.11, pp.1612-1623, 2014. ,
DOI : 10.1109/TASLP.2014.2341918
Real-time acoustic source localization in noisy environments for human-robot multimodal interaction, RO-MAN 2007, The 16th IEEE International Symposium on Robot and Human Interactive Communication, 2007. ,
DOI : 10.1109/ROMAN.2007.4415116
Geometric calibration of distributed microphone arrays from acoustic source correspondences, 2010 IEEE International Workshop on Multimedia Signal Processing, pp.13-18, 2010. ,
DOI : 10.1109/MMSP.2010.5661986
Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006. ,
DOI : 10.1109/ICASSP.2006.1661100
URL : http://www.gel.usherb.ca/laborius/papers/ICASSP2006.pdf
Detection of documentary scene changes by audiovisual fusion, Proc Int Conf Image Video Retr, pp.227-238, 2003. ,
From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, vol.31, issue.3, pp.107-115, 2014. ,
DOI : 10.1109/MSP.2013.2297440
URL : https://hal.archives-ouvertes.fr/hal-00922378
Automatic monitoring of activities of daily living based on real-life acoustic sensor data: a preliminary study, Proc Int Workshop Speech Lang Process Assist Technol, pp.113-118, 2013. ,
Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design, Trends in Amplification, vol.52, issue.20, pp.332-352, 2008. ,
DOI : 10.1109/TSP.2004.828896
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4111459/pdf
Voice source localization for automatic camera pointing system in videoconferencing, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ,
DOI : 10.1109/ICASSP.1997.599595
Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision, vol.73, issue.2, pp.60-79, 2013. ,
DOI : 10.1007/s11263-006-9794-4
URL : https://hal.archives-ouvertes.fr/hal-00725627
Particle filtering algorithms for tracking an acoustic source in a reverberant environment, IEEE Transactions on Speech and Audio Processing, vol.11, issue.6, pp.826-836, 2003. ,
DOI : 10.1109/TSA.2003.818112
Multimodal information fusion for video concept detection, Proc IEEE Int Conf Image Process, pp.2391-2394, 2004. ,
Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification, Proceedings of the ACM International Conference on Multimedia, MM '14, pp.167-176, 2014. ,
DOI : 10.1038/nrn2331
Probabilistic latent tensor factorisation, Proc Int Conf Latent Var Anal Signal Sep, pp.346-353, 2010. ,
Coupled Nonnegative Matrix Factorization Unmixing for Hyperspectral and Multispectral Data Fusion, IEEE Transactions on Geoscience and Remote Sensing, vol.50, issue.2, pp.528-537, 2012. ,
DOI : 10.1109/TGRS.2011.2161320
Matrix co-factorization on compressed sensing, Proc Int Joint Conf Artif Intell, 2011. ,
Discriminations of interaural phase differences, The Journal of the Acoustical Society of America, vol.55, issue.6, pp.1299-1303, 1974. ,
DOI : 10.1121/1.1914701
Integration of acoustic and visual speech signals using neural networks, IEEE Communications Magazine, vol.27, issue.11, pp.65-71, 1989. ,
DOI : 10.1109/35.41402
Distributed Marginalized Auxiliary Particle Filter for Speaker Tracking in Distributed Microphone Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.11, pp.1921-1934, 2016. ,
DOI : 10.1109/TASLP.2016.2590146
Accelerated Speech Source Localization via a Hierarchical Search of Steered Response Power, IEEE Transactions on Speech and Audio Processing, vol.12, issue.5, pp.499-508, 2004. ,
DOI : 10.1109/TSA.2004.832990