Positionnement visuel dans un monde d'objets

Vincent Gaudillière

Résumé

Augmented Reality can be defined as the superimposition of reality and elements (sounds, 2D and 3D images, videos, etc.) calculated in real time by a computer system. In practice, this term refers to the addition of visual elements, either in the field of view of an observer through specific glasses (e.g. Microsoft Hololens, Magic Leap One), or on a screen through which the observer sees reality (usually a smartphone or tablet). During this research work, we were interested in the deployment of Augmented Reality in an industrial context, and more particularly in the challenges that large industrial environments (factories, plants, ships) represent in terms of image analysis and processing. In particular, we investigated the use of objects of interest present in the scene to recognize the place of the observer and then calculate his precise position with respect to the environment. Applications include manufacturing assistance, maintenance assistance, documentation and training. After proposing a functional definition of the concept of place in an industrial environment, as a zone of interaction around an object of interest, we approached place recognition as an image retrieval task in which the similarity between the unknown image and the reference images is measured in two steps. The validity of the images with the greatest similarity to the unknown image is then assessed by epipolar geometry estimation between the unknown image and each of the retrieved images. The similarity measurement and geometry estimation are guided by the calculation of object-level correspondences between regions of interest of the two images. To calculate the camera pose, we then took advantage of the objects of interest present in the scene, using a modeling of the latter in the form of ellipsoids, the projections of the objects in the image being modeled as ellipses. Our contributions to the problem of estimating camera pose from ellipse - ellipsoid correspondences are both theoretical and practical. In particular, we have shown that there is a parametrization of the solutions to the one-ellipsoid problem, and, moreover, that the camera pose estimation problem can be reduced to an orientation estimation problem only. We have also proposed a robust way to handle the multiple possible matches between the objects detected in the image and the objects present in the 3D scene model.

La Réalité Augmentée peut être définie comme la superposition de la réalité et d'éléments (sons, images 2D, 3D, vidéos, etc.) calculés par un système informatique en temps réel. En pratique, ce terme désigne l'ajout d'éléments visuels, soit dans le champ de vision d'un observateur par l'intermédiaire de lunettes spécifiques (ex. : Microsoft Hololens, Magic Leap One), soit sur un écran à travers lequel l'observateur voit la réalité (généralement un smartphone ou une tablette). Au cours de ce travail de recherche, nous nous sommes intéressés au déploiement de la Réalité Augmentée dans un contexte industriel, et plus particulièrement aux défis que des environnements industriels de grande taille (usines, centrales, navires) représentent en termes d'analyse et de traitement des images. Nous avons notamment étudié le recours aux objets d'intérêt présents dans la scène pour reconnaître le lieu dans lequel se trouve l'observateur puis calculer sa position précise par rapport à l'environnement. Les applications visées sont, entre autres, l'aide à la fabrication, l'aide à la maintenance, la documentation et la formation. Après avoir proposé une définition fonctionnelle du concept de lieu en environnement industriel, comme zone d'interaction autour d'un objet d'intérêt, nous avons abordé la reconnaissance de lieux comme une tâche de récupération d'images dans laquelle la similarité entre l'image inconnue et les images de référence est mesurée en deux étapes. La validité des images présentant les plus grandes similarités avec l'image inconnue est ensuite évaluée par estimation de la géométrie épipolaire liant l'image inconnue et chacune des images récupérées. La mesure de similarité et l'estimation de la géométrie sont guidées par le calcul de correspondances de niveau objet entre régions d'intérêt des deux images. Pour calculer la pose de la caméra, nous avons ensuite tiré profit des objets d'intérêt présents dans la scène, en utilisant pour cela une modélisation de ces derniers sous forme d'ellipsoïdes, les projections des objets dans l'image étant modélisées sous forme d'ellipses. Nos contributions au problème d'estimation de pose de caméra à partir de correspondances ellipse - ellipsoïde sont d'ordre à la fois théorique et pratique. Nous avons notamment montré qu'il existe une paramétrisation des solutions du problème à un seul ellipsoïde, et, par ailleurs, que le problème d'estimation de pose de caméra peut être réduit à un problème d'estimation de son orientation seulement. Nous avons également proposé une manière robuste de traiter les multiples appariements possibles entre les objets détectés dans l'image et les objets présents dans le modèle 3D de la scène.

Visual positioning in a world of objects

Positionnement visuel dans un monde d'objets

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager