A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proc. ICML, pp.369-376, 2006.

A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, Proc. ICASSP, Vancouver, pp.6645-6649, 2013.

L. Lu, X. Zhang, K. Cho, and S. Renals, A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition, Proc. Interspeech, 2015.

C. Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen et al., Stateof-the-art speech recognition with sequence-to-sequence models, Proc. ICASSP, pp.4774-4778, 2018.

T. Nagamine, M. L. Seltzer, and N. Mesgarani, Exploring how deep neural networks form phonemic categories, Proc. Interspeech, pp.1912-1916, 2015.

T. Pellegrini and S. Mouysset, Inferring phonemic classes from cnn activation maps using clustering techniques, Proc. Interspeech, pp.1287-1290, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01474886

K. Krishna, S. Toshniwal, and K. Livescu, Hierarchical multitask learning for ctc-based speech recognition, 2018.

Y. Belinkov and J. Glass, Analyzing hidden representations in end-to-end automatic speech recognition systems, Advances in Neural Information Processing Systems, pp.2441-2451, 2017.

K. Rao and H. Sak, Multi-accent speech recognition with hierarchical grapheme based models, Proc. ICASSP, pp.4815-4819, 2017.

S. Kim, T. Hori, and S. Watanabe, Joint ctc-attention based endto-end speech recognition using multi-task learning, Proc. ICASSP, pp.4835-4839, 2017.

R. Sanabria and F. Metze, Hierarchical Multitask Learning With CTC, 2018 IEEE Spoken Language Technology Workshop (SLT), pp.485-490, 2018.

S. Fernández, A. Graves, and J. Schmidhuber, Sequence labelling in structured domains with hierarchical recurrent neural networks, Proc. IJCAI, pp.774-779, 2007.

A. Jimenez, B. Elizalde, and B. Raj, Sound event classification using ontology-based neural networks, Proc. NeurIPS, Montreal, 2018.

J. Garofolo, D. Graff, D. Paul, and D. Pallett, CSR-I (WSJ0) complete LDC93S6A, Web Download. Philadelphia: Linguistic Data Consortium, 1993.

M. Mohri, F. Pereira, and M. Riley, Weighted finite-state transducers in speech recognition, Computer Speech & Language, vol.16, issue.1, pp.69-88, 2002.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The kaldi speech recognition toolkit, Proc. ASRU, 2011.

Y. Miao, M. Gowayyed, and F. Metze, Eesen: End-to-end speech recognition using deep rnn models and wfst-based decoding, Proc. ASRU, pp.167-174, 2015.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Gated feedback recurrent neural networks, Proc. ICML, pp.2067-2075, 2015.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in PyTorch, Proc. NIPS, Long Beach, 2017.

A. Graves and N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, Proc. ICML, pp.1764-1772, 2014.