Temporal Difference Rewards for End-to-end Vision-based Active Robot Tracking using Deep Reinforcement Learning

Pavlos Tiritiris; Nikolaos Passalis; Anastasios Tefas

Communication Dans Un Congrès Année : 2021

Temporal Difference Rewards for End-to-end Vision-based Active Robot Tracking using Deep Reinforcement Learning

(1) , (1) , (1)

Pavlos Tiritiris

Fonction : Auteur
PersonId : 1104621

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

Nikolaos Passalis

Fonction : Auteur
PersonId : 1102536

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

Anastasios Tefas

Fonction : Auteur
PersonId : 1102537

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

Résumé

Object tracking allows for localizing moving objects in sequences of frames providing detailed information regarding the trajectory of objects that appear in a scene. In this paper, we study active object tracking, where a tracker receives an input visual observation and directly outputs the most appropriate control actions in order to follow and keep the target in its field of view, unifying in this way the task of visual tracking and control. This is in contrast with conventional tracking approaches, as typically developed by the computer vision community, where the problem of detecting the tracked object in a frame is decoupled from the problem of controlling the camera and/or the robot to follow the object. Deep Reinforcement Learning (DLR) methods hold the credentials for overcoming these issues, since they allow for tackling both problems, i.e., detecting the tracked object and providing control commands, at the same time. However, DRL algorithms require a significantly different methodology for training compared to traditional computer vision models, e.g., they rely on dynamic simulations for training instead of static datasets, while they are often notoriously difficult to converge, often requiring reward shaping approaches for increasing convergence speed and stability. The main contribution of this paper is a DRL, vision-based active tracking method, along with an appropriately designed reward shaping approach for active tracking problems. The developed methods are evaluated using a state-of-the-art robotics simulator, demonstrating good generalization on various dynamic trajectories of moving objects under a wide range of different setups.

Mots clés

Active Tracking Deep Reinforcement Learning Reward Shaping Webots

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

m26787-tiritiris.pdf (333.04 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nikolaos Passalis : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03281187

Soumis le : jeudi 8 juillet 2021-09:07:21

Dernière modification le : lundi 19 juillet 2021-08:12:32

Archivage à long terme le : samedi 9 octobre 2021-18:15:26

Dates et versions

hal-03281187 , version 1 (08-07-2021)

Identifiants

HAL Id : hal-03281187 , version 1

Citer

Pavlos Tiritiris, Nikolaos Passalis, Anastasios Tefas. Temporal Difference Rewards for End-to-end Vision-based Active Robot Tracking using Deep Reinforcement Learning. International Conference on Emerging Techniques in Computational Intelligence, ICETCI 2021, 2021, Virtual, India. ⟨hal-03281187⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

20 Consultations

82 Téléchargements

Temporal Difference Rewards for End-to-end Vision-based Active Robot Tracking using Deep Reinforcement Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager