L. Atlas, A. Shihab, and . Shamma, Joint acoustic and modulation frequency, EURASIP Journal on, pp.668-675, 2003.

M. Rachel, B. Bittner, J. Mcfee, P. Salamon, J. P. Li et al., Deep salience representations for f0 estimation in polyphonic music, Proc. of ISMIR (International Society for Music Information Retrieval), 2017.

S. Böck, F. Krebs, and G. Widmer, Accurate tempo estimation based on recurrent neural networks and resonating comb filters, Proc. of IS-MIR (International Society for Music Information Retrieval), 2015.

C. Chen, M. Cremer, K. Lee, P. Dimaria, and H. Wu, Improving perceived tempo estimation by statistical modeling of higherlevel musical descriptors, Audio Engineering Society Convention 126, 2009.

D. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), 2015.

A. Faraldo, S. Jorda, and P. Herrera, A multi-profile method for key estimation in edm, AES International Conference on Semantic Audio, 2017.

J. Foote, L. Matthew, U. Cooper, and . Nam, Audio retrieval by rhythmic similarity, Proc. of IS-MIR (International Society for Music Information Retrieval), 2002.

M. Gainza and E. Coyle, Tempo detection using a hybrid multiband approach. Audio, Speech and Language Processing, IEEE Transactions on, vol.19, issue.1, pp.57-68, 2011.

A. Gkiokas, V. Katsouros, and G. Carayannis, Reducing tempo octave errors by periodicity vector coding and svm learning, Proc. of IS-MIR (International Society for Music Information Retrieval), 2012.

M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, Rwc music database: Popular, classical and jazz music databases, Proc. of ISMIR (International Society for Music Information Retrieval), 2002.

M. Goto and Y. Muraoka, A beat tracking system for acoustic signals of music, Proceedings of the second ACM international conference on Multimedia, 1994.

F. Gouyon, A. Klapuri, S. Dixon, M. Alonso, G. Tzanetakis et al., An experimental comparison of audio tempo induction algorithms. Audio, Speech and Language Processing, IEEE Transactions on, vol.14, issue.5, pp.1832-1844, 2006.

H. Stephen-webley, Techniques for the Automated Analysis of Musical Audio, 2004.

A. Holzapfel, E. P. Matthew, . Davies, R. José, J. Zapata et al., Selective sampling for beat tracking evaluation. Audio, Speech and Language Processing, IEEE Transactions on, vol.20, issue.9, pp.2539-2548, 2012.

A. Holzapfel and Y. Stylianou, Scale transform in rhythmic similarity of music. Audio, Speech and Language Processing, IEEE Transactions on, vol.19, issue.1, pp.176-185, 2011.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, 2014.

P. Anssi, . Klapuri, J. Antti, J. Eronen, and . Astola, Analysis of the meter of acoustic musical signals, IEEE Transactions on, vol.14, issue.1, pp.342-355, 2006.

P. Knees, A. Faraldo, P. Herrera, R. Vogl, S. Böck et al., Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections, Proc. of ISMIR (International Society for Music Information Retrieval), 2015.

M. Levy, Improving perceptual tempo estimation with crowd-sourced annotations, Proc. of ISMIR (International Society for Music Information Retrieval), 2011.

C. Robert, J. W. Maher, and . Beauchamp, Fundamental frequency estimation of musical signals using a twoway mismatch procedure, JASA (Journal of the Acoustical Society of America), vol.95, issue.4, pp.2254-2263, 1994.

U. Marchand, Q. Fresnel, and G. Peeters, beat, downbeat and swing annotations, Late-Breaking Demo Session of the 16th International Society for Music Information Retrieval Conference, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01252607

U. Marchand and G. Peeters, The modulation scale spectrum and its application to rhythm-content description, Proc. of DAFx (International Conference on Digital Audio Effects), 2014.

U. Marchand and G. Peeters, Late-Breaking Demo Session of the 17th International Society for Music Information Retrieval Conf, Late-Breaking/Demo Session of ISMIR (International Society for Music Information Retrieval), 2016.

U. Marchand and G. Peeters, Scale and shift invariant time/frequency representation using auditory statistics: Application to rhythm description, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01368206

R. Mcauley and . Quatieri, Speech analysis/synthesis based on a sinusoidal representation. Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.34, pp.744-754, 1986.

B. Mcfee, C. Raffel, D. Liang, P. W. Daniel, M. Ellis et al., Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in python, Proceedings of the 14th python in science conference, 2015.

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proc. of ICMC (International Computer Music Conference), 2010.

G. Peeters, A large set of audio features for sound description (similarity and classification) in the cuidado project, 2004.

G. Peeters, Template-based estimation of timevarying tempo, Advances in Signal Processing, p.67215, 2006.

G. Peeters, Template-based estimation of tempo: using unsupervised or supervised learning to create better spectral templates, Proc. of DAFx (International Conference on Digital Audio Effects), pp.209-212, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01106562

G. Peeters, Spectral and temporal periodicity representations of rhythm for the automatic classification of music audio signal. Audio, Speech and Language Processing, IEEE Transactions on, vol.19, issue.5, pp.1242-1252, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01106655

G. Peeters and J. , Perceptual tempo estimation using gmm-regression, Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01106792

G. Percival and G. Tzanetakis, Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol.22, issue.12, pp.1765-1776, 2014.

C. Raffel, Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching, 2016.

D. Eric and . Scheirer, Tempo and beat analysis of acoustic musical signals, JASA (Journal of the Acoustical Society of America), vol.103, issue.1, pp.588-601, 1998.

H. Schreiber and M. Müller, A single-step approach to musical tempo estimation using a convolutional neural network, Proc. of ISMIR (International Society for Music Information Retrieval), 2018.

H. Schreiber and M. Müller, A postprocessing procedure for improving music tempo estimates using supervised learning, Proc. of ISMIR (International Society for Music Information Retrieval), 2017.

X. Serra and J. Smith, Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition, Computer Music Journal, vol.14, issue.4, pp.12-24, 1990.

K. Seyerlehner, G. Widmer, and D. Schnitzer, From rhythm patterns to perceived tempo, Proc. of ISMIR (International Society for Music Information Retrieval), 2007.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

G. Tzanetakis and P. Cook, Musical genre classification of audio signals. Speech and Audio Processing, IEEE Transactions on, vol.10, issue.5, pp.293-302, 2002.

L. Xiao, A. Tian, W. Li, and J. Zhou, Using statistic model to capture the association between timbre and perceived tempo, Proc. of ISMIR (International Society for Music Information Retrieval), 2008.

J. Zapata and E. Gómez, Comparative evaluation and combination of audio tempo estimation approaches, Proceedings of the 20th ISMIR Conference, 2011.