Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Developmental Learning of Audio-Visual Integration From Facial Gestures Of a Social Robot

Abstract : We present a robot head with facial gestures, audio and vision capabilities toward the emergence of infant-like social features. For this, we propose a neural architecture that integrates these three modalities following a developmental stage with social interaction with a caregiver. During dyadic interaction with the experimenter, the robot learns to categorize audio-speech gestures of vowels /a/, /i/, /o/ as a baby would do it, by linking someone-else facial expressions to its own movements. We show that multimodal integration in the neural network is more robust than unimodal learning so that it compensates erroneous or noisy information coming from each modality. Therefore, facial mimicry with a partner can be reproduced using redundant audiovisual signals or noisy information from one modality only. Statistical experiments on 24 naive participants show the robustness of our algorithm during human-robot interactions in public environment where many people move and talk all the time. We then discuss our model in the light of human-robot communication, the development of social skills and language in infants.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download
Contributor : Alexandre Pitti <>
Submitted on : Friday, July 26, 2019 - 4:22:37 PM
Last modification on : Thursday, March 5, 2020 - 4:25:49 PM


  • HAL Id : hal-02185423, version 1


Oriane Dermy, Sofiane Boucenna, Alexandre Pitti, Arnaud Blanchard. Developmental Learning of Audio-Visual Integration From Facial Gestures Of a Social Robot. 2019. ⟨hal-02185423⟩



Record views


Files downloads