N. Tishby, C. Fernando, W. Pereira, and . Bialek, The information bottleneck method, The 37th annual Allerton Conference on Communication, Control, and Computing, pp.368-377, 1999.

K. Vesel?, M. Karafiát, F. Grézl, M. Janda, and E. Egorova, The language-independent bottleneck features, Proc. Spoken Language Technology Workshop (SLT), pp.336-341, 2012.

D. Yu, L. Michael, and . Seltzer, Improved bottleneck features using pretrained deep neural networks, 2011.

P. Diederik, M. Kingma, and . Welling, Auto-encoding variational bayes, 2013.

A. Van-den-oord, O. Vinyals, and K. Kavukcuoglu, Neural Discrete Representation Learning, 2017.

J. Chorowski, R. J. Weiss, S. Bengio, and A. Van-den-oord, Unsupervised speech representation learning using wavenet autoencoders, Speech, and Language Processing, vol.27, pp.2041-2053, 2019.

E. Dunbar, R. Algayres, J. Karadayi, M. Bernard, J. Benjumea et al., The Zero Resource Speech Challenge 2019: TTS Without T, Proc. Interspeech, pp.1088-1092, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02274112

A. Tjandra, B. Sisman, M. Zhang, S. Sakti, H. Li et al., VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019, Proc. Interspeech, pp.1118-1122, 2019.

R. Eloff, A. Nortje, A. Benjamin-van-niekerk, L. Govender, A. Nortje et al., Unsupervised Acoustic Unit Discovery for Speech Synthesis Using Discrete Latent-Variable Neural Networks, Proc. Interspeech, pp.1103-1107, 2019.

E. Jang, S. Gu, and B. Poole, Categorical Reparameterization with Gumbel-Softmax, 2016.

Y. Bengio, Estimating or Propagating Gradients Through Stochastic Neurons, 2013.

J. Chorowski, N. Chen, R. Marxer, H. Dolfing, A. ?a?cucki et al., Unsupervised neural segmentation and clustering for unit discovery in sequential data, Perception as Generative Reasoning Workshop, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02399138

S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.

H. Sak, A. W. Senior, and F. Beaufays, Long short-term memory recurrent neural network architectures for large scale acoustic modeling, INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, pp.338-342, 2014.

D. Arthur and S. Vassilvitskii, K-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.1027-1035, 2007.

J. S. Vitter, Random sampling with a reservoir, ACM Trans. Math. Softw, vol.11, issue.1, pp.37-57, 1985.

A. Roy and A. Vaswani, Arvind Neelakantan, and Niki Parmar, Theory and Experiments on Vector Quantized Autoencoders, 2018.

A. Razavi, A. Van-den-oord, and O. Vinyals, Generating Diverse High-Fidelity Images with VQ-VAE-2, 2019.

J. G. Hans, J. Dolfing, J. Bellegarda, R. Chorowski, A. Marxer et al., The "ScribbleLens" Dutch historical handwriting corpus, Under review for: International Conference on Frontiers of Handwriting Recognition (ICFHR), 2020.

D. Amodei, R. Sundaram-ananthanarayanan, and . Anubhai, Deep speech 2 : End-to-end speech recognition in english and mandarin, Proceedings of The 33rd International Conference on Machine Learning, vol.48, pp.173-182, 2016.

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd International Conference on Machine Learning, pp.369-376, 2006.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The kaldi speech recognition toolkit, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, p.11, 2011.

, Open Speech and Language Resources

A. Van-den, N. Oord, K. Kalchbrenner, and . Kavukcuoglu, Pixel recurrent neural networks, Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol.48, pp.1747-1756, 2016.

D. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, 2014.

T. Boris and A. Polyak, Acceleration of stochastic approximation by averaging, SIAM journal on control and optimization, vol.30, issue.4, pp.838-855, 1992.

A. Bruno, D. J. Olshausen, and . Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, vol.381, issue.6583, pp.607-609, 1996.

D. Daniel, H. S. Lee, and . Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, pp.788-791, 1999.

A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, Deep Variational Information Bottleneck, 2016.

H. Wu and M. Flierl, Variational Information Bottleneck on Vector Quantized Autoencoders, 2018.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, vol.8, issue.3-4, pp.229-256, 1992.

G. Tucker, A. Mnih, C. J. Maddison, J. Lawson, and J. Sohl-dickstein, REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models, Advances in Neural Information Processing Systems, vol.30, pp.2627-2636, 2017.

. Casper-kaae, B. Sønderby, A. Poole, and . Mnih, Continuous relaxation training of discrete latent variable image models

C. Lee and J. Glass, A Nonparametric Bayesian Approach to Acoustic Model Discovery, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.40-49, 2012.

L. Ondel, L. Burget, and J. , Variational Inference for Acoustic Unit Discovery, Procedia Computer Science, vol.81, pp.80-86, 2016.

J. Ebbers, J. Heymann, L. Drude, T. Glarner, R. Haeb-umbach et al., Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery, pp.488-492, 2017.

T. Glarner, P. Hanebrink, J. Ebbers, and R. Haeb-umbach, Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery, pp.2688-2692, 2018.

M. J. Johnson, D. Duvenaud, A. B. Wiltschko, R. Sandeep, R. P. Datta et al., Composing graphical models with neural networks for structured representations and fast inference, 2016.