Learning from narrated instruction videos

Automatic assistants could guide a person or a robot in performing new tasks, such as changing a car tire or repotting a plant. Creating such assistants, however, is non-trivial and requires understanding of visual and verbal content of a video. Towards this goal, we here address the problem of automatically learning the main steps of a task from a set of narrated instruction videos. We develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method sequentially clusters textual and visual representations of a task, where the two clustering problems are linked by joint constraints to obtain a single coherent sequence of steps in both modalities. To evaluate our method, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains videos for five different tasks with complex interactions between people and objects, captured in a variety of indoor and outdoor settings. We experimentally demonstrate that the proposed method can automatically discover, learn and localize the main steps of a task in input videos.

Mots clés

Unsupervised learning Narrated instruction videos Step discovery

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Intelligence artificielle [cs.AI]

Fichier principal

pami2016alayrac.pdf (6.88 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Jean-Baptiste Alayrac : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01580630

Soumis le : vendredi 1 septembre 2017-19:22:48

Dernière modification le : vendredi 19 avril 2024-16:18:56

Archivage à long terme le : samedi 2 décembre 2017-14:36:48

Dates et versions

hal-01580630 , version 1 (01-09-2017)

Identifiants

HAL Id : hal-01580630 , version 1

Citer

Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, et al.. Learning from narrated instruction videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, XX. ⟨hal-01580630⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 PSL

525 Consultations

613 Téléchargements