S. Pinker, Language Learnability and Language Development, 1984.

N. Akhtar and L. Montague, Early lexical acquisition: the role of crosssituational learning, pp.347-358, 1999.

A. Bandura, Social learning theory, Social Learning Theory, pp.1-46, 1971.

J. F. Cangelosi and A. , Cross-situational and supervised learning in the emergence of communication, Interaction Studies, vol.12, issue.1, pp.119-133, 2011.

J. Piaget, Play, Dreams and Imitation in Childhood, ser. Developmental psychology. Routledge, 1999.

M. Tomasello, The Cultural Origins of Human Cognition, 1999.

J. Call and M. Carpenter, Three sources of information in social learning, Imitation in animals and artifacts, pp.211-228, 2002.

D. H. Grollman and O. C. Jenkins, Sparse incremental learning for interactive robot control policy estimation, 2008 IEEE International Conference on Robotics and Automation, pp.3315-3320, 2008.
DOI : 10.1109/ROBOT.2008.4543716

S. Calinon and A. G. Billard, What is the Teacher ' s Role in Robot Programming by Demonstration? Toward Benchmarks for Improved Learning, Science, vol.8, pp.441-464, 2007.

A. L. Thomaz and C. Breazeal, Teachable robots: Understanding human teaching behavior to build more effective robot learners, Artificial Intelligence, vol.172, issue.6-7, pp.716-737, 2008.
DOI : 10.1016/j.artint.2007.09.009

R. Maclin, J. Shavlik, L. Torrey, T. Walker, and E. Wild, Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression, Proceedings of the 20th National Conference on Artificial Intelligence, pp.819-824, 2005.

P. Y. Oudeyer, F. Kaplan, and V. V. Hafner, Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, vol.11, issue.2, pp.265-286, 2007.
DOI : 10.1109/TEVC.2006.890271

P. Chandrashekhariah, G. Spina, and J. Triesch, Let it Learn-A Curious Vision System for Autonomous Object Learning, VISAPP, issue.2, pp.2013-169

Y. Chen and D. Filliat, Cross-situational noun and adjective learning in an interactive scenario, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2015.
DOI : 10.1109/DEVLRN.2015.7346129

URL : https://hal.archives-ouvertes.fr/hal-01170674

Y. Chen, J. Bordes, and D. Filliat, An experimental comparison between NMF and LDA for active cross-situational object-word learning, 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2016.
DOI : 10.1109/DEVLRN.2016.7846822

URL : https://hal.archives-ouvertes.fr/hal-01370853

G. Kachergis, C. Yu, and R. M. Shiffrin, Temporal contiguity in crosssituational statistical learning, 2009.

P. J. Gorniak, The affordance-based concept, 2005.

S. Harnad, The symbol grounding problem, Physica D: Nonlinear Phenomena, vol.42, issue.1-3, pp.335-346, 1990.
DOI : 10.1016/0167-2789(90)90087-6

L. Steels, Evolving grounded communication for robots, Trends in Cognitive Sciences, vol.7, issue.7, pp.308-312, 2003.
DOI : 10.1016/S1364-6613(03)00129-3

S. D. Larson, Intrinsic Representation: Bootstrapping Symbols from Experience, 2004.
DOI : 10.1007/978-3-540-24840-8_15

C. Yu, L. B. Smith, and A. F. Pereira, Grounding word learning in multimodal sensorimotor interaction, Proceedings of the 30th annual conference of the cognitive science society, pp.1017-1022, 2008.

N. Mavridis, C. Datta, S. Emami, C. Benabdelkader, A. Tanoto et al., FaceBots, Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, HRI '09, pp.195-196, 2009.
DOI : 10.1145/1514095.1514132

K. Gold and B. Scassellati, Grounded pronoun learning and pronoun reversal, Proceedings of the 5th International Conference on Development and Learning, 2006.

D. K. Roy, A computational model of word learning from multimodal sensory input, Proceedings of the International Conference of Cognitive Modeling (ICCM2000), 2000.

N. Mavridis and D. K. Roy, Grounded Situation Models for Robots: Where words and percepts meet, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.4690-4697, 2006.
DOI : 10.1109/IROS.2006.282258

T. Spexard, S. Li, B. Wrede, J. Fritsch, G. Sagerer et al., BIRON, where are you? Enabling a robot to learn new places in a real home environment by integrating spoken dialog and visual localization, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.934-940, 2006.
DOI : 10.1109/IROS.2006.281770

T. Regier and L. A. Carlson, Grounding spatial language in perception: An empirical and computational investigation., Journal of Experimental Psychology: General, vol.130, issue.2, p.273, 2001.
DOI : 10.1037/0096-3445.130.2.273

D. K. Roy, Learning visually grounded words and syntax for a scene description task, Computer Speech & Language, vol.16, issue.3-4, pp.353-385, 2002.
DOI : 10.1016/S0885-2308(02)00024-4

K. R. Coventry and S. C. Garrod, Saying, seeing and acting: The psychological semantics of spatial prepositions, 2004.

M. Skubic, D. Perzanowski, S. Blisard, A. Schultz, W. Adams et al., Spatial Language for Human???Robot Dialogs, IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol.34, issue.2, pp.154-167, 2004.
DOI : 10.1109/TSMCC.2004.826273

H. Zender, O. M. Mozos, P. Jensfelt, G. Kruijff, and W. Burgard, Conceptual spatial representations for indoor mobile robots, Robotics and Autonomous Systems, vol.56, issue.6, pp.493-502, 2008.
DOI : 10.1016/j.robot.2008.03.007

S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee et al., Approaching the Symbol Grounding Problem with Probabilistic Graphical Models, AI Magazine, vol.32, issue.4, pp.64-76, 2011.
DOI : 10.1609/aimag.v32i4.2384

S. Tellex, P. Thaker, J. Joseph, and N. Roy, Learning perceptually grounded word meanings from unaligned parallel data, Machine Learning, pp.151-167, 2014.
DOI : 10.1109/TSMCC.2004.826273

S. Coradeschi, A. Loutfi, and B. Wrede, A Short Review of Symbol Grounding in Robotic and Intelligent Systems, KI - K??nstliche Intelligenz, vol.16, issue.4, pp.129-136, 2013.
DOI : 10.1162/artl_a_00007

P. Vogt, The physical symbol grounding problem, Cognitive Systems Research, vol.3, issue.3, pp.429-457, 2002.
DOI : 10.1016/S1389-0417(02)00051-7

A. Cangelosi, The grounding and sharing of symbols, Pragmatics & Cognition, vol.14, issue.2, pp.275-285, 2006.
DOI : 10.1075/bct.16.07can

C. Yu and D. H. Ballard, On the integration of grounding language and learning objects, AAAI, pp.488-493, 2004.

O. Mangin, The Emergence of Multimodal Concepts, 2014.
URL : https://hal.archives-ouvertes.fr/tel-01148936

T. Altosaar, L. Bosch, G. Aimetti, C. Koniaris, K. Demuynck et al., A speech corpus for modeling language acquisition: Caregiver, LREC, 2010.

T. Araki, T. Nakamura, T. Nagai, K. Funakoshi, M. Nakano et al., Autonomous acquisition of multimodal information for online object concept formation by robots, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.1540-1547, 2011.
DOI : 10.1109/IROS.2011.6094814

K. Noda, H. Arie, Y. Suga, and T. Ogata, Multimodal integration learning of robot behavior using deep neural networks, Robotics and Autonomous Systems, vol.62, issue.6, pp.721-736, 2014.
DOI : 10.1016/j.robot.2014.03.003

D. K. Roy and A. Pentland, Learning words from sights and sounds: a computational model, Cognitive Science, vol.55, issue.3, pp.113-146, 2002.
DOI : 10.2307/1130007

T. M. Cover and J. A. Thomas, Elements of information theory, 2012.

W. Schueller and P. Oudeyer, Active learning strategies and active control of complexity growth in naming games, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2015.
DOI : 10.1109/DEVLRN.2015.7346144

URL : https://hal.archives-ouvertes.fr/hal-01202654

G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, Information Processing & Management, vol.24, issue.5, pp.513-523, 1988.
DOI : 10.1016/0306-4573(88)90021-0

D. D. Lee and H. S. Seung, Learning the parts of objects by nonnegative matrix factorization, Nature, vol.401, pp.788-791, 1999.

D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation, J. Mach. Learn. Res, vol.3, pp.993-1022, 2003.

D. Gentner and L. L. Namy, Comparison in the Development of Categories, Cognitive Development, vol.14, issue.4, pp.487-513, 1999.
DOI : 10.1016/S0885-2014(99)00016-7

J. Yue-hei-ng, F. Yang, and L. S. Davis, Exploiting local features from deep networks for image retrieval, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.53-61, 2015.

G. Kachergis, C. Yu, and R. M. Shiffrin, A Bootstrapping Model of Frequency and Context Effects in Word Learning, Cognitive Science, vol.37, issue.1, pp.590-622, 2017.
DOI : 10.1111/cogs.12035

Y. Gong, Q. Ke, M. Isard, and S. Lazebnik, A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics, International Journal of Computer Vision, vol.22, issue.12, pp.210-233, 2014.
DOI : 10.1109/TPAMI.2008.127

K. Kersting, M. Wahabzada, C. Thurau, and C. Bauckhage, Hierarchical Convex NMF for Clustering Massive Data, pp.253-268, 2010.

D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum, Hierarchical topic models and the nested chinese restaurant process, Advances in Neural Information Processing Systems, p.2003, 2004.