C. G. Snoek and M. Worring, Multimodal Video Indexing: A Review of the State-of-the-art, Multimedia Tools and Applications, vol.25, issue.1, pp.5-35, 2005.
DOI : 10.1023/B:MTAP.0000046380.27575.a5

T. Chua, S. Chang, L. Chaisorn, and W. Hsu, Story boundary detection in large broadcast news video archives, Proceedings of the 12th annual ACM international conference on Multimedia , MULTIMEDIA '04, pp.656-659
DOI : 10.1145/1027527.1027679

F. Wang, Y. Ma, H. Zhang, and J. Li, A generic framework for semantic sports video analysis using dynamic bayesian networks, Multi-Media Modeling Conference , International, pp.115-122, 2005.

Y. Li, S. S. Narayanan, and C. Kuo, Adaptive speaker identification with audiovisual cues for movie content analysis, Pattern Recognition Letters, vol.25, issue.7, pp.777-791, 2004.
DOI : 10.1016/j.patrec.2004.01.004

M. Covell, S. Baluja, and M. Fink, Detecting Ads in Video Streams Using Acoustic and Visual Cues, Computer, vol.39, issue.12, pp.135-137, 2006.
DOI : 10.1109/MC.2006.421

C. Herley, ARGOS: automatically extracting repeating objects from multimedia streams, IEEE Transactions on Multimedia, vol.8, issue.1, pp.115-129, 2006.
DOI : 10.1109/TMM.2005.861286

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.219.4065

X. Yang, Q. Tian, and P. Xue, Efficient Short Video Repeat Identification With Application to News Video Structure Analysis, IEEE Transactions on Multimedia, vol.9, issue.3, pp.600-609, 2007.
DOI : 10.1109/TMM.2006.889352

A. Divakaran, K. A. Peker, R. Radhakrishnan, Z. Y. Xiong, and R. Cabasson, Video Summarization Using Mpeg-7 Motion Activity and Audio Descriptors, p.4, 2003.
DOI : 10.1007/978-1-4757-6928-9_4

Y. Wang, H. Jiang, M. S. Drew, Z. Li, and G. Mori, Unsupervised Discovery of Action Classes, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.1654-1661, 2006.
DOI : 10.1109/CVPR.2006.321

C. Ma and C. Lee, Unsupervised anchor shot detection using multi-modal spectral clustering, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.813-816, 2008.
DOI : 10.1109/ICASSP.2008.4517734

]. A. Dielmann, Unsupervised detection of multimodal clusters in edited recordings, 2010 IEEE International Workshop on Multimedia Signal Processing, 2010.
DOI : 10.1109/MMSP.2010.5662015

M. Ben and G. Gravier, Unsupervised mining of audiovisually consistent segments in videos with application to structure analysis, 2011 IEEE International Conference on Multimedia and Expo, 2011.
DOI : 10.1109/ICME.2011.6011951

URL : https://hal.archives-ouvertes.fr/hal-00646603

A. Ta, M. Ben, and G. Gravier, Improving Cluster Selection and Event Modeling in Unsupervised Mining for Automatic Audiovisual Video Structuring, The 18th Int. Conf. on MultiMedia Modeling, 2012.
DOI : 10.1109/TMM.2006.889352

URL : https://hal.archives-ouvertes.fr/hal-00671157

A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin, Canal9: A database of political debates for analysis of social interactions, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, 2009.
DOI : 10.1109/ACII.2009.5349466