M. S. Brandstein and H. F. Silverman, A practical methodology for speech source localization with microphone arrays, Computer Speech & Language, vol.11, issue.2, 1997.
DOI : 10.1006/csla.1996.0024

D. Chai and K. Ngan, Locating facial region of a head-and-shoulders color image, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998.
DOI : 10.1109/AFGR.1998.670936

M. E. Farmer, R. Hsu, and A. K. Jain, Interacting multiple model (IMM) Kalman filters for robust high speed human motion tracking, Object recognition supported by user interaction for service robots, p.2, 2002.
DOI : 10.1109/ICPR.2002.1048226

F. Gustafsson and F. Gunnarsson, Positioning using time-difference of arrival measurements, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2003.
DOI : 10.1109/ICASSP.2003.1201741

T. M. Hospedales and S. Vijayakumar, Structure Inference for Bayesian Multisensory Scene Understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.12, 2008.
DOI : 10.1109/TPAMI.2008.25

Q. Nguyen and J. Choi, Audio-visual data fusion for tracking the direction of multiple speakers, Control Automation and Systems, 2010.

E. Osuna, R. Freund, and F. Girosi, Training support vector machines: an application to face detection, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997.
DOI : 10.1109/CVPR.1997.609310

B. D. Rao, M. M. Trivedi, H. Schneiderman, and T. Kanade, Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition Probabilistic modeling of local appearance and spatial relationships for object recognition, Acoustics, Speech, and Signal Processing CVPR'98, 1998.

C. G. Snoek, Early versus late fusion in semantic video analysis, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, 2005.
DOI : 10.1145/1101149.1101236

S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent Robotics and Autonomous Agents), 2005.

R. Vaillant, C. Monrocq, L. Cun, and Y. , Original approach for the localisation of objects in images. Vision, Image and Signal Processing, 1994.

J. Valin, F. Michaud, B. Hadjou, R. , and J. , Localization of simultaneous moving sound sources for mobile robot using a frequency- domain steered beamformer approach, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004, 2004.
DOI : 10.1109/ROBOT.2004.1307286

P. Viola and M. Jones, Robust real-time face detection, p.4, 2004.

C. Zhang and Z. Zhang, A survey of recent advances in face detection, 2010.