M. Wilson, Six views of embodied cognition Psychonomic bulletin & review, pp.625-636, 2002.

L. Barsalou, Perceptual symbol systems, Behavioral and Brain Sciences, vol.22, issue.04, pp.577-660, 1999.
DOI : 10.1017/S0140525X99002149

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.601.93

A. Cangelosi, Grounding language in action and perception: from cognitive agents to humanoid robots Physics of life reviews, pp.139-151, 2010.

S. Harnad, The symbol grounding problem, Physica D: Nonlinear Phenomena, vol.42, issue.1-3, pp.335-346, 1990.
DOI : 10.1016/0167-2789(90)90087-6

A. Glenberg and M. Kaschak, Grounding language in action Psychonomic bulletin & review, pp.558-65, 2002.

H. Mcgurk and J. Macdonald, Hearing lips and seeing voices, Nature, vol.65, issue.5588, pp.746-748, 1976.
DOI : 10.1038/264746a0

J. Schwartz, A reanalysis of McGurk data suggests that audiovisual fusion in speech perception is subject-dependent, The Journal of the Acoustical Society of America, vol.127, issue.3, pp.1584-1594, 2010.
DOI : 10.1121/1.3293001

URL : https://hal.archives-ouvertes.fr/hal-00442364

W. Sandra, R. , M. Dana, and B. , Words as invitations to form categories: Evidence from 12-to 13- month-old infants, Cognitive psychology, vol.29, issue.3, pp.257-302, 1995.

G. Lupyan, D. Rakison, and J. Mcclelland, Language is not Just for Talking: Redundant Labels Facilitate Learning of Novel Categories, Psychological Science, vol.64, issue.12, pp.1077-1083, 2007.
DOI : 10.1111/j.0956-7976.2005.00787.x

T. Belpaeme and A. Morse, Word And Category Learning in a Continuous Semantic Domain: Comparing Cross-Situational and Interactive Learning Advances in Complex Systems, pp.10-1142, 2012.

T. Cederborg and P. Oudeyer, From Language to Motor Gavagai: Unified Imitation Learning of Multiple Linguistic and Nonlinguistic Sensorimotor Skills, IEEE Transactions on Autonomous Mental Development, vol.5, issue.3, pp.222-239, 2013.
DOI : 10.1109/TAMD.2013.2279277

URL : https://hal.archives-ouvertes.fr/hal-00910982

E. Markman, Constraints Children Place on Word Meanings, Cognitive Science, vol.1, issue.1, pp.57-77, 1990.
DOI : 10.1207/s15516709cog1401_4

M. Brent, Speech segmentation and word discovery: a computational perspective, Trends in Cognitive Sciences, vol.3, issue.8, 1999.
DOI : 10.1016/S1364-6613(99)01350-9

R. Blake, A neural theory of binocular rivalry. Psychological review, pp.145-167, 1989.

D. Leopold and N. Logothetis, Multistable phenomena: changing views in perception Trends in cognitive sciences, pp.254-264, 1999.

J. Schwartz, N. Grimault, J. Hupé, B. Moore, and D. Pressnitzer, Multistability in perception: binding sensory modalities, an overview, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.10, issue.11, pp.896-905, 1591.
DOI : 10.1167/10.11.1

URL : https://hal.archives-ouvertes.fr/hal-00642308

E. Cherry, Some Experiments on the Recognition of Speech, with One and with Two Ears, The Journal of the Acoustical Society of America, vol.25, issue.5, pp.975-985, 1953.
DOI : 10.1121/1.1907229

J. Schwartz, F. Berthommier, and C. Savariaux, Seeing to hear better: evidence for early audio-visual interactions in speech identification, Cognition, vol.93, issue.2, pp.69-78, 2004.
DOI : 10.1016/j.cognition.2004.01.006

URL : https://hal.archives-ouvertes.fr/hal-00186797

D. Sodoyer, L. Girin, C. Jutten, and J. Schwartz, Developing an audio-visual speech source separation algorithm, Speech Communication, vol.44, issue.1-4, pp.113-125, 2004.
DOI : 10.1016/j.specom.2004.10.002

URL : https://hal.archives-ouvertes.fr/hal-00186591

G. Massera, E. Tuci, T. Ferrauto, and S. Nolfi, The Facilitatory Role of Linguistic Instructions on Developing Manipulation Skills, IEEE Computational Intelligence Magazine, vol.5, issue.3, pp.33-42, 2010.
DOI : 10.1109/MCI.2010.937321

N. Akhtar and L. Montague, Early lexical acquisition: the role of cross-situational learning, First Language, vol.19, issue.57, pp.347-358, 1999.
DOI : 10.1177/014272379901905703

L. Smith and C. Yu, Infants rapidly learn word-referent mappings via cross-situational statistics, Cognition, vol.106, issue.3, pp.1558-1568, 2008.
DOI : 10.1016/j.cognition.2007.06.010

B. Landau, L. Smith, S. Jones, and . Pmid, Object perception and object naming in early development Trends in cognitive sciences, pp.19-24, 1998.

A. Cangelosi, G. Metta, G. Sagerer, S. Nolfi, C. Nehaniv et al., Integration of Action and Language Knowledge: A Roadmap for Developmental Robotics, IEEE Transactions on Autonomous Mental Development, vol.2, issue.3, pp.167-195, 2010.
DOI : 10.1109/TAMD.2010.2053034

J. Konczak, On the notion of motor primitives in humans and robots. In: International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, pp.47-53, 2005.

F. Mussa-ivaldi and E. Bizzi, Motor learning through the combination of primitives, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.355, issue.1404, pp.1755-69, 1404.
DOI : 10.1098/rstb.2000.0733

M. Tresch and A. Jarc, The case for and against muscle synergies Current opinion in neurobiology, pp.601-608, 2009.

E. Tuci, T. Ferrauto, A. Zeschel, G. Massera, and S. Nolfi, An Experiment on Behaviour Generalisation and the Emergence of Linguistic Compositionality in Evolving Robots, IEEE Transactions on Autonomous Mental Development, vol.3, issue.2, pp.1-14, 2011.

B. Wrede, K. Rohlfing, J. Steil, S. Wrede, P. Oudeyer et al., Towards robots with teleological action and language understanding, Humanoids 2012 Workshop on Developmental Robotics: Can developmental robotics yield human-like cognitive abilities, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00788627

L. Steels, The symbol grounding problem has been solved, so what's next?, Symbols and Embodiment: Debates on Meaning and Cognition, 2008.
DOI : 10.1093/acprof:oso/9780199217274.003.0012

D. Yurovsky, C. Yu, and L. Smith, Statistical Speech Segmentation and Word Learning in Parallel: Scaffolding from Child-Directed Speech, Frontiers in Psychology, vol.3, p.23162487
DOI : 10.3389/fpsyg.2012.00374

O. Mangin, The Choreography 2 dataset, 2013.

O. Mangin and P. Oudeyer, Learning to recognize parallel combinations of human motion primitives with linguistic descriptions using non-negative matrix factorization, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
DOI : 10.1109/IROS.2012.6385641

URL : https://hal.archives-ouvertes.fr/hal-00764353

O. Mangin and P. Oudeyer, Learning semantic components from subsymbolic multimodal perception, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL), 2013.
DOI : 10.1109/DevLrn.2013.6652563

URL : https://hal.archives-ouvertes.fr/hal-00842453

T. Altosaar, L. Bosch, G. Aimetti, C. Koniaris, K. Demuynck et al., A Speech Corpus for Modeling Language Acquisition: CAREGIVER. In: Language Resources and Evaluation?LREC, pp.1062-1068, 2008.

L. Boves, L. Bosch, and R. Moore, ACORNS - towards computational modeling of communication and recognition skills, 6th IEEE International Conference on Cognitive Informatics, pp.349-356, 2007.
DOI : 10.1109/COGINF.2007.4341909

N. Lyubova and D. Filliat, Developmental approach for interactive object discovery, The 2012 International Joint Conference on Neural Networks (IJCNN), 2012.
DOI : 10.1109/IJCNN.2012.6252606

URL : https://hal.archives-ouvertes.fr/hal-00755298

P. Paatero and U. Tapper, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, vol.18, issue.2, pp.111-126, 1994.
DOI : 10.1002/env.3170050203

D. Lee and H. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, pp.788-91, 1999.

W. Xu, X. Liu, and Y. Gong, Document clustering based on non-negative matrix factorization, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , SIGIR '03, pp.267-273, 2003.
DOI : 10.1145/860435.860485

L. Bosch, H. Van-hamme, and L. Boves, Unsupervised detection of words questioning the relevance of segmentation, Speech Analysis and Processing for Knowledge Discovery. ITRW ISCA, 2008.

J. Driesen, L. Bosch, and H. Van-hamme, Adaptive Non-negative Matrix Factorization in a Computational Model of Language Acquisition, pp.1-4, 2009.

T. Joachims, Text categorization with Support Vector Machines: Learning with many relevant features, LS VIII-Reportität Do, p.23, 1997.
DOI : 10.1007/BFb0026683

J. Sivic and A. Zisserman, Efficient Visual Search for Objects in Videos, Proceedings of the IEEE, vol.96, issue.4, pp.548-566, 2008.
DOI : 10.1109/JPROC.2008.916343

S. Calinon and A. Billard, Statistical Learning by Imitation of Competing Constraints in Joint Space and Task Space, Advanced Robotics, vol.23, issue.15, pp.2059-2076, 2009.
DOI : 10.1163/016918609X12529294461843

D. Kulic, H. Imagawa, and Y. Nakamura, Online acquisition and visualization of motion primitives for humanoid robots, RO-MAN 2009, The 18th IEEE International Symposium on Robot and Human Interactive Communication, pp.1210-1215, 2009.
DOI : 10.1109/ROMAN.2009.5326307

H. Van-hamme, HAC-models: a Novel Approach to Continuous Speech Recognition, pp.2554-2557, 2008.

J. Driesen, J. Gemmeke, and H. Van-hamme, Data-driven speech representations for NMF-based word learning, 2012.

J. Driesen, Discovering words in speech using matrix factorization, 2012.

O. Mangin, The Emergence of Multimodal Concepts: From Perceptual Motion Primitives to Grounded Acoustic Words, 2014.
URL : https://hal.archives-ouvertes.fr/tel-01148936

H. Bay, T. Tuytelaars, and L. Van-gool, Surf: Speeded up robust features, Computer Vision?ECCV 2006, pp.404-417, 2006.
DOI : 10.1007/11744023_32

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.679.3046

B. Micusik and J. Ko?ecka, Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry Japan: IEEE, Oriented Object and Event Classification (VOEC), held jointly with International Conf. on Computer Vision (ICCV), 2009.

D. Filliat, Interactive learning of visual topological navigation, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.248-254, 2008.
DOI : 10.1109/IROS.2008.4650681

URL : https://hal.archives-ouvertes.fr/hal-00641356

O. Mangin, D. Filliat, and P. Oudeyer, A bag-of-features framework for incremental learning of speech invariants in unsegmented audio streams, Tenth International Conference on Epigenetic Robotics, pp.73-80, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00541802

J. Driesen, H. Van-hamme, and B. Kleijn, Learning from images and speech with Non-negative Matrix Factorization enhanced by input space scaling, 2010 IEEE Spoken Language Technology Workshop
DOI : 10.1109/SLT.2010.5700813

V. Stouten, K. Demuynck, and H. Van-hamme, Discovering Phone Patterns in Spoken Utterances by Non-Negative Matrix Factorization, IEEE Signal Processing Letters, vol.15, pp.131-134, 2008.
DOI : 10.1109/LSP.2007.911723

T. Cover and J. Thomas, Elements of information theory, 1991.

J. Munkres, Algorithms for the Assignment and Transportation Problems, Journal of the Society for Industrial and Applied Mathematics, vol.5, issue.1, pp.32-38, 1957.
DOI : 10.1137/0105003

D. Roy, Learning from Sights and Sounds: A Computational Model. Massachussetts Institute of Technology, 1999.

D. Roy and A. Pentland, Learning words from sights and sounds: A computational model. Cognitive science, pp.113-146, 2002.

C. Yu and D. Ballard, A Multimodal Learning Interface for Grounding Spoken Language in Sensory Perceptions, Transactions on Applied Perception, issue.1, pp.57-80, 2004.

N. Iwahashi, Language acquisition through a human???robot interface by combining speech, visual, and behavioral information, Information Sciences, vol.156, issue.1-2, pp.109-121, 2003.
DOI : 10.1016/S0020-0255(03)00167-1

Y. Sugita and T. J. , Learning Semantic Combinatoriality from the Interaction between Linguistic and Behavioral Processes, Adaptive Behavior, vol.3, issue.1, pp.33-43, 2005.
DOI : 10.1177/105971230501300102

R. Lienhart, S. Romberg, and E. Hörster, Multilayer pLSA for multimodal image retrieval, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, p.1, 2009.
DOI : 10.1145/1646396.1646408

Z. Akata, C. Thurau, and C. Bauckhage, Non-negative Matrix Factorization in Multimodality Data for Segmentation and Label Prediction, Computer Vision Winter Workshop. 16. Mitterberg, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00652879

J. Benabdallah, J. Caicedo, F. Gonzalez, and O. Nasraoui, Multimodal Image Annotation Using Non-negative Matrix Factorization, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp.128-135, 2010.
DOI : 10.1109/WI-IAT.2010.293

N. Srivastava, R. Salakhutdinov, F. Pereira, C. Burges, L. Bottou et al., Multimodal Learning with Deep Boltzmann Machines, Advances in Neural Information Processing Systems 25, pp.2222-2230, 2012.

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. Senior, Recent advances in the automatic recognition of audiovisual speech, Proceedings of the IEEE, vol.91, issue.9, pp.1306-1326, 2003.
DOI : 10.1109/JPROC.2003.817150

K. Saenko and T. Darrell, Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers, Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms? MLMI. 4, 2007.
DOI : 10.1007/978-3-540-78155-4_4

L. Steels, The Talking Heads Experiment, 1999.

L. Steels and F. Kaplan, Bootstrapping grounded word semantics Linguistic Evolution through Language Acquisition: Formal and Computational Models, 2002.

A. Droniou, S. Ivaldi, and O. Sigaud, Deep unsupervised network for multimodal perception, representation and classification, Robotics and Autonomous Systems, vol.71, 2014.
DOI : 10.1016/j.robot.2014.11.005

URL : https://hal.archives-ouvertes.fr/hal-01083521

M. Versteegh, T. Bosch, L. Boves, and L. , Modelling novelty preference in word learning, Interspeech 2011: 12th Annual Conference of the International Speech Communication Association, pp.761-764, 2011.

P. Oudeyer and L. Smith, How Evolution may work through Curiosity-driven Developmental Process Topics in Cognitive Science, Topics in Cognitive Science, 2014.