Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos

Abstract : This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on handcrafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. Using a frame by frame labeling, we obtain nearly state-of-the-art performance on the NYU-v2 depth dataset with an accuracy of 64.5%. We then show that the labeling can be further improved by exploiting the temporal consistency in the video sequence of the scene. To that goal, we present a method producing temporally consistent superpixels from a streaming video. Among the di erent methods producing superpixel segmentations of an image, the graph-based approach of Felzenszwalb and Huttenlocher is broadly employed. One of its interesting properties is that the regions are computed in a greedy manner in quasi-linear time by using a minimum spanning tree. In a framework exploiting minimum spanning trees all along, we propose an efficient video segmentation approach that computes temporally consistent pixels in a causal manner, filling the need for causal and real-time applications. We illustrate the labeling of indoor scenes in video sequences that could be processed in real-time using appropriate hardware such as an FPGA.
Type de document :
Article dans une revue
Journal of Machine Learning Research, Journal of Machine Learning Research, 2014, 15, pp.3489−3511
Liste complète des métadonnées

Littérature citée [48 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01066586
Contributeur : Laurent Najman <>
Soumis le : mardi 23 septembre 2014 - 17:13:34
Dernière modification le : mardi 11 novembre 2014 - 14:56:58
Document(s) archivé(s) le : vendredi 14 avril 2017 - 15:32:11

Fichier

couprie_jmlr2014_depth.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01066586, version 1

Citation

Camille Couprie, Clément Farabet, Laurent Najman, Yann Lecun. Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos. Journal of Machine Learning Research, Journal of Machine Learning Research, 2014, 15, pp.3489−3511. 〈hal-01066586〉

Partager

Métriques

Consultations de la notice

546

Téléchargements de fichiers

595