Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos

Camille Couprie; Clément Farabet; Laurent Najman; Yann Lecun

Article Dans Une Revue Journal of Machine Learning Research Année : 2014

Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos

(1) , (2) , (3) , (4, 5)

1
2
3
4
5

Camille Couprie

Fonction : Auteur
PersonId : 880568

IFP Energies nouvelles

Clément Farabet

Fonction : Auteur

Twitter Inc

Laurent Najman

Fonction : Auteur
PersonId : 28
IdHAL : laurent-najman
ORCID : 0000-0002-6190-0235
IdRef : 087172712

Laboratoire d'Informatique Gaspard-Monge

Yann Lecun

Fonction : Auteur

Courant Institute of Mathematical Sciences [New York]

Facebook AI Research [Paris]

Résumé

This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on handcrafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. Using a frame by frame labeling, we obtain nearly state-of-the-art performance on the NYU-v2 depth dataset with an accuracy of 64.5%. We then show that the labeling can be further improved by exploiting the temporal consistency in the video sequence of the scene. To that goal, we present a method producing temporally consistent superpixels from a streaming video. Among the di erent methods producing superpixel segmentations of an image, the graph-based approach of Felzenszwalb and Huttenlocher is broadly employed. One of its interesting properties is that the regions are computed in a greedy manner in quasi-linear time by using a minimum spanning tree. In a framework exploiting minimum spanning trees all along, we propose an efficient video segmentation approach that computes temporally consistent pixels in a causal manner, filling the need for causal and real-time applications. We illustrate the labeling of indoor scenes in video sequences that could be processed in real-time using appropriate hardware such as an FPGA.

Mots clés

deep learning optimization convolutional networks superpixels depth information

Domaines

Apprentissage [cs.LG] Réseau de neurones [cs.NE] Traitement des images [eess.IV]

Fichier principal

couprie_jmlr2014_depth.pdf (12.7 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Laurent Najman : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01066586

Soumis le : mardi 23 septembre 2014-17:13:34

Dernière modification le : jeudi 28 mars 2024-03:26:02

Archivage à long terme le : vendredi 14 avril 2017-15:32:11

Dates et versions

hal-01066586 , version 1 (23-09-2014)

Identifiants

HAL Id : hal-01066586 , version 1

Citer

Camille Couprie, Clément Farabet, Laurent Najman, Yann Lecun. Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos. Journal of Machine Learning Research, 2014, 15, pp.3489−3511. ⟨hal-01066586⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC IFP CNRS UNIV-MLV LIGM_A3SI PARISTECH LIGM ESIEE-PARIS GDMM UNIV-EIFFEL JSE2024

505 Consultations

435 Téléchargements

Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager