A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos et al., Deep speech: Scaling up end-to-end speech recognition, 2014.

Y. Miao, M. Gowayyed, and F. Metze, EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.167-174, 2015.
DOI : 10.1109/ASRU.2015.7404790
URL : http://arxiv.org/abs/1507.08240

J. Glass, Towards unsupervised speech processing, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp.1-4
DOI : 10.1109/ISSPA.2012.6310546

A. Jansen, E. Dupoux, S. Goldwater, M. Johnson, S. Khudanpur et al., A summary of the 2012 JH CLSP Workshop on zero resource speech technologies and models of early language acquisition, Proceedings of ICASSP 2013, 2013.

G. Adda, S. Stücker, M. Adda-decker, O. Ambouroue, L. Besacier et al., Breaking the unwritten kanguage barrier: The Bulb project, Proceedings of SLTU (Spoken Language Technologies for Under-Resourced Languages), 2016.
DOI : 10.1016/j.procs.2016.04.023
URL : https://doi.org/10.1016/j.procs.2016.04.023

L. Duong, A. Anastasopoulos, D. Chiang, S. Bird14, and T. Cohn, An Attentional Model for Speech Translation Without Transcription, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.949-959, 2016.
DOI : 10.18653/v1/N16-1109

A. Bérard, O. Pietquin, C. Servan, and L. Besacier, Listen and translate: A proof of concept for end-to-end speech-to-text translation, NIPS Workshop on end-to-end learning for speech and audio processing, 2016.

L. Besacier, B. Zhou, and Y. Gao, TOWARDS SPEECH TRANSLATION OF NON WRITTEN LANGUAGES, 2006 IEEE Spoken Language Technology Workshop, pp.222-225, 2006.
DOI : 10.1109/SLT.2006.326795

E. Dupoux, Cognitive science in the era of artificial intelligence: A roadmap for reverseengineering the infant language-learner, 2016.

M. Versteegh, R. Thiolliere, T. Schatz, X. N. Cao, X. Anguera et al., The zero resource speech challenge 2015, Proc. of Interspeech, 2015.
DOI : 10.1016/j.procs.2016.04.031
URL : https://doi.org/10.1016/j.procs.2016.04.031

M. Versteegh, X. Anguera, A. Jansen, and E. Dupoux, The Zero Resource Speech Challenge 2015: Proposed Approaches and Results, Procedia Computer Science: Proceedings of SLTU 2016, pp.67-72, 2016.
DOI : 10.1016/j.procs.2016.04.031
URL : https://doi.org/10.1016/j.procs.2016.04.031

L. Lisker, S. Arthur, and . Abramson, Crosslanguage Study of Voicing in Initial Stops, The Journal of the Acoustical Society of America, vol.35, issue.11, pp.384-422, 1964.
DOI : 10.1121/1.2142685

F. Janet, . Werker, C. Richard, and . Tees, Cross-language speech perception: Evidence for perceptual reorganization during the first year of life, pp.49-63, 1984.

K. Patricia and . Kuhl, Human adults and human infants show a perceptual magnet effect for the prototypes of speech categories, monkeys do not, Attention, Perception , & Psychophysics, vol.50, issue.2, pp.93-107, 1991.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The kaldi speech recognition toolkit, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, 2011.

D. Wang, X. Zhang, and Z. Zhang, Thchs-30: A free chinese speech corpus, 2015.

E. Gauthier, L. Besacier, S. Voisin, M. Melese, and U. P. Elingui, Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof, LREC, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01350037

T. Schatz, V. Peddinti, F. Bach, A. Jansen, H. Hermansky et al., Evaluating speech features with the Minimal- Pair ABX task (I): Analysis of the classical MFC/PLP pipeline, INTERSPEECH, 2013.

T. Schatz, V. Peddinti, X. Cao, F. Bach, H. Hermansky et al., Evaluating speech features with the Minimal- Pair ABX task (II): Resistance to noise, INTER- SPEECH, 2014.

H. Chen, C. Leung, L. Xie, B. Ma, and H. Li, Parallel inference of Dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study, INTERSPEECH, 2015.

M. Heck, S. Sakti, and S. Nakamura, Feature optimized dpgmm clustering for unsupervised subword modeling: A contribution to zerospeech 2017

J. Chang, J. W. Fisher, and I. , Parallel sampling of DP mixture models using sub-cluster splits, Advances in Neural Information Processing Systems, pp.620-628, 2013.

T. Pellegrini, C. Manenti, and J. Pinquier, Unsupervised discovery of sub-lexical units in speech based on ZCA and k-means

H. Chen, C. Leung, L. Xie, B. Ma, and H. Li, Multilingual bottle-neck feature learning from untranscribed speech

T. Ansari, S. Singh, R. Kumar, and S. Ganapathy, Deep learning methods for unsupervised acoustic modeling: LEAP submission to ZeroSpeech challenge 2017

T. Ansari, R. Kumar, S. Singh, S. Ganapathy, and S. Devi, Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions

R. Thiolliere, E. Dunbar, G. Synnaeve, M. Versteegh, and E. Dupoux, A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling, INTER- SPEECH, pp.3179-3183, 2015.

D. Renshaw, H. Kamper, A. Jansen, and S. Goldwater, A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge, Sixteenth Annual Conference of the International Speech Communication Association, 2015.

Y. Yuan, C. Leung, L. Xie, H. Chen, B. Ma et al., Extracting bottleneck features and word-like pairs from untranscribed speech for feature representations

A. Jansen and B. Van-durme, Efficient spoken term discovery using randomized algorithms, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp.401-406, 2011.
DOI : 10.1109/ASRU.2011.6163965
URL : http://www.cs.jhu.edu/%7Evandurme/papers/JansenVanDurmeASRU11.pdf

H. Shibata, T. Kato, T. Shinozaki, and S. Watanabe, Composite embedding systems for ZeroSpeech2017 track 1

S. Goldwater, L. Thomas, M. Griffiths, and . Johnson, A Bayesian framework for word segmentation: Exploring the effects of context, Cognition, vol.112, issue.1, pp.21-54, 2009.
DOI : 10.1016/j.cognition.2009.03.008

H. Kamper, K. Livescu, and S. Goldwater, An embedded segmental k-means model for unsupervised segmentation and clustering of speech