Multi-modal and Cross-Modal for Lecture Videos Retrieval

Nhu-Van Nguyen; Mickaël Coustaty; Jean-Marc Ogier

doi:10.1109/ICPR.2014.461

Communication Dans Un Congrès Année : 2014

Multi-modal and Cross-Modal for Lecture Videos Retrieval

(1) , (1) , (1)

Nhu-Van Nguyen

Fonction : Auteur
PersonId : 5844
IdHAL : nhu-van-nguyen
ORCID : 0000-0003-2271-6918

Laboratoire Informatique, Image et Interaction - EA 2118

Mickaël Coustaty

Fonction : Auteur
PersonId : 2462
IdHAL : mickael-coustaty
ORCID : 0000-0002-0123-439X
IdRef : 160560268

Laboratoire Informatique, Image et Interaction - EA 2118

Jean-Marc Ogier

Fonction : Auteur
PersonId : 833747

Laboratoire Informatique, Image et Interaction - EA 2118

Résumé

The problem of multi-modal and cross-modal lecture videos retrieval is studied in this paper, on the basis of the use of document analysis techniques. In the context of this paper, a lecture video is represented by a set of subjects, in which a subject is represented by a Bag of mixed words -visual words and textual words-, each of them coming from speech recognition and OCR engines. Our work relies on two assumptions 1) a video may contain multiple subjects, 2) multiple modalities exist in the same lecture video document. We propose in this research a combination of technologies issuing from image document analysis and text mining. Visual words and textual words in images of lecture slides are extracted based on text detection and graphics localization computed on the sequences captured with a camera. Assuming that a subject in the video composes of a set of slides, lecture slides are clustered in different groups representing different possible subjects by using mixed words extracted. Multimodal and cross-modal lecture video retrieval are realized by the Bag of Subjects model. We discuss the proposed indexing and retrieval approach for lecture videos and report a quantitative evaluation on lecture videos of our University. It is shown that using Bag of Subjects for lecture video retrieval improves the retrieval accuracy.

Domaines

Informatique [cs]

Mickaël Coustaty : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01247962

Soumis le : mercredi 23 décembre 2015-11:08:16

Dernière modification le : jeudi 12 mai 2022-15:35:57

Dates et versions

hal-01247962 , version 1 (23-12-2015)

Identifiants

HAL Id : hal-01247962 , version 1
DOI : 10.1109/ICPR.2014.461

Citer

Nhu-Van Nguyen, Mickaël Coustaty, Jean-Marc Ogier. Multi-modal and Cross-Modal for Lecture Videos Retrieval. Pattern Recognition (ICPR), 2014 22nd International Conference on , Aug 2014, Stockholm, Sweden. pp.2667 - 2672, ⟨10.1109/ICPR.2014.461⟩. ⟨hal-01247962⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

L3I UNIV-ROCHELLE

32 Consultations

0 Téléchargements

Multi-modal and Cross-Modal for Lecture Videos Retrieval

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager