,
,
,
Convolutional neural networks for speech recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.10, pp.1533-1545, 2014. ,
The dragon system-an overview, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.23, issue.1, pp.24-29, 1975. ,
Acoustic scene classification: Classifying environments from the sounds they produce, IEEE Signal Processing Magazine, vol.32, issue.3, pp.16-34, 2015. ,
, Theano: new features and speed improvements, Deep Learning and Unsupervised Feature Learning NIPS Workshop, 2012.
Learning deep architectures for ai, Found. Trends Mach. Learn, vol.2, issue.1, pp.1-127, 2009. ,
The problem of learning long-term dependencies in recurrent networks, IEEE International Conference on Neural Networks, vol.3, pp.1183-1188, 1993. ,
Greedy layer-wise training of deep networks, NIPS, 2007. ,
Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.157-166, 1994. ,
Theano: a CPU and GPU math expression compiler, Proceedings of the Python for Scientific Computing Conference (SciPy), 2010. ,
Neural networks for pattern recognition, ch. The Multi-layer Perceptron, pp.116-161, 1995. ,
Acoustic scene classification with matrix factorization for unsupervised feature learning, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6445-6449, 2016. ,
Yaafe, an easy to use and efficient audio feature extraction software, proceedings of the 11th ISMIR conference, 2010. ,
The hybrid hmm/mlp approach, pp.155-183, 1994. ,
librosa: Audio and Music Signal Analysis in Python, Proceedings of the 14th Python in Science Conference, pp.18-25, 2015. ,
Where am i? scene recognition for mobile robots using audio features, IEEE International Conference on Multimedia and Expo, pp.885-888, 2006. ,
Environmental sound recognition with time-frequency audio features, IEEE Trans. on Audio, Speech, and Language Processing, vol.17, issue.6, pp.1142-1158, 2009. ,
An industrial-strength audio search algorithm, Proceedings of the 4 th International Conference on Music Information Retrieval, 2003. ,
Detection and analysis of abnormal situations through fear-type acoustic manifestations, IEEE International Conference on Acoustics, Speech and Signal Processing -ICASSP '07, vol.4, pp.21-24, 2007. ,
Events detection for an audio-based surveillance system, IEEE International Conference on Multimedia and Expo, pp.1306-1309, 2005. ,
Comparison of techniques for environmental sound recognition, Pattern Recognition Letters, vol.24, issue.15, pp.2895-2907, 2003. ,
Audio surveillance: A systematic review, ACM Comput. Surv, vol.48, issue.4, p.46, 2016. ,
Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems, vol.2, issue.4, pp.303-314, 1989. ,
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, Audio, Speech, and Language Processing, IEEE Transactions, vol.20, issue.1, pp.30-42, 2012. ,
Maximum likelihood from incomplete data via the em algorithm, JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, vol.39, issue.1, pp.1-38, 1977. ,
, Deep learning: Methods and applications, Found. Trends Signal Process, vol.7, pp.197-387, 2014.
Image feature representation of the subband power distribution for robust sound event classification, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.2, pp.367-377, 2013. ,
Automatic recognition of environmental sound events using all-pole group delay features, European Signal Processing Conference, pp.734-738, 2015. ,
Transfer learning of weakly labeled audio, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017. ,
Audio-based context recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol.14, issue.1, pp.321-329, 2006. ,
Machine learning techniques for multimedia analysis, pp.59-80, 2011. ,
Alexey Ozerov, Fabio Antonacci, and Augusto Sarti, Multiview approaches to event detection and scene analysis, Romain Serizel, pp.243-276, 2018. ,
Do we need hundreds of classifiers to solve real world classification ?, J. of Machine Learning Research, vol.15, pp.3133-3181, 2014. ,
Progress in pattern recognition, image analysis, computer vision, and applications: 17th ibero american congress, pp.14-36, 2012. ,
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.36, issue.4, pp.193-202, 1980. ,
, Multilayer neural networks and bayes decision theory, vol.11, pp.209-213, 1998.
, Audio-video surveillance system for public transportation, World Congress on Railway Research, 2011.
Improving event detection for audio surveillance using gabor filterbank features, European Signal Processing Conference, pp.719-723, 2015. ,
Scream and Gunshot detection in noisy environments, European Signal Processing Conference, 2007. ,
Learning to forget: Continual prediction with lstm, Neural Comput, vol.12, issue.10, pp.2451-2471, 2000. ,
Learning precise timing with lstm recurrent networks, J. Mach. Learn. Res, vol.3, pp.115-143, 2003. ,
Supervised learning from incomplete data via an em approach, Advances in Neural Information Processing Systems, vol.6, pp.120-127, 1994. ,
A probabilistic approach to the understanding and training of neural network classifiers, International Conference on Acoustics, Speech, and Signal Processing, vol.3, pp.1361-1364, 1990. ,
Speech recognition with deep recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6645-6649, 2013. ,
Framewise phoneme classification with bidirectional lstm and other neural network architectures, Neural Networks, pp.5-6, 2005. ,
Sequence transduction with recurrent neural networks, International Conference of Machine Learning, 2012. ,
Offline handwriting recognition with multidimensional recurrent neural networks, Advances in Neural Information Processing Systems, vol.21, pp.545-552, 2009. ,
Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, i, Indiana Univ, Math. J, vol.19, pp.53-91, 1970. ,
Learning from imbalanced data, IEEE Trans. on Knowl. and Data Eng, vol.21, issue.9, pp.1263-1284, 2009. ,
Context-dependent sound event detection, EURASIP Journal on Audio, Speech, and Music Processing, issue.1, p.1, 2013. ,
Automatic classification of musical instrument sounds, Journal of New Music Research, vol.32, 2003. ,
A fast learning algorithm for deep belief nets, Neural Comput, vol.18, issue.7, pp.1527-1554, 2006. ,
Parallel distributed processing: Explorations in the microstructure of cognition, vol.1, pp.282-317, 1986. ,
Long short-term memory, Neural Comput, vol.9, issue.8, pp.1735-1780, 1997. ,
Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, 2001. ,
Scream detection for home applications, IEEE Conference on Industrial Electronics and Applications, pp.2115-2120, 2010. ,
The group method of data handling: an rival of the method of stochastic approximation, Soviet Automatic Control, vol.13, issue.3, pp.43-55, 1968. ,
Sound-event recognition with a companion humanoid, IEEE-RAS International Conference on Humanoid Robots (Humanoids), pp.104-111, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00768767
Sound Representation and Classification Benchmark for Domestic Robots, IEEE International Conference on Robotics and Automation, vol.2014, pp.6285-6292, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00952092
The class imbalance problem: A systematic study, Intell. Data Anal, vol.6, issue.5, pp.429-449, 2002. ,
Continuous speech recognition by statistical methods, Proceedings of the IEEE, vol.64, issue.4, pp.532-556, 1976. ,
Hierarchical approach for abnormal acoustic event classification in an elevator, 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp.89-94, 2011. ,
Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2259-2263, 2016. ,
A morphological model for simulating acoustic scenes and its application to sound event detection, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.24, issue.10, pp.1854-1864, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01111381
Deep neural networks for automatic detection of screams and shouted speech in subway trains, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6460-6464, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01385272
The bag-of-frames approach: a not so sufficient model for urban soundscapes, Journal of the Acoustical Society of America, vol.138, issue.5, pp.487-492, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01082501
, The handbook of brain theory and neural networks, ch. Convolutional Networks for Images, Speech, and Time Series, pp.255-258, 1998.
Acoustic Signal Based Abnormal Event Detection in Indoor Environment using Multiclass Adaboost, IEEE Trans. on Consumer Electronics, vol.59, issue.3, pp.615-622, 2013. ,
Sound-event partitioning and feature normalization for robust sound-event detection, Int. Conf. on Digital Signal. Processing (Hong Kong), pp.389-394, 2014. ,
A study on the dynamic time warping in kernel machines, Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, pp.839-845, 2007. ,
A comparison of deep learning methods for environmental sound detection, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012. ,
Pattern classification using neural networks, IEEE Communications Magazine, vol.27, issue.11, pp.47-50, 1989. ,
, From statistics to neural networks: Theory and pattern recognition applications, ch. Neural Networks, Bayesian a posteriori Probabilities, and Pattern Classification, pp.83-104, 1994.
The harpy speech recognition system, p.7619331, 1976. ,
Metric learning based data augmentation for environmental sound classification, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017. ,
A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics, vol.5, issue.4, pp.115-133, 1943. ,
Robust sound event classification using deep neural networks, IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol.23, issue.3, pp.540-552, 2015. ,
Acoustic event detection in real life recordings, 18th European Signal Processing Conference, pp.1267-1271, 2010. ,
Tut database for acoustic scene classification and sound event detection, 24th European Signal Processing Conference (EUSIPCO), pp.1128-1132, 2016. ,
Metrics for polyphonic sound event detection, Applied Sciences, vol.6, issue.6, p.162, 2016. ,
Kaldi+pdnn: Building dnn-based ASR systems with kaldi and PDNN, 2014. ,
Recurrent neural network based language model ,
Deep neural network based learning and transferring mid-level audio features for acoustic scene classification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.796-800, 2017. ,
Improvement of recognition of simultaneous speech signals using av integration and scattering theory for humanoid robots, Speech Communication, vol.44, issue.1, pp.97-112, 2004. ,
, Connectionist learning of belief networks, vol.56, pp.71-113, 1992.
Learning in graphical models, pp.335-368, 1998. ,
On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, Advances in Neural Information Processing Systems, vol.14, pp.841-848, 2002. ,
Audio-visual speech recognition using deep learning, Applied Intelligence, vol.42, issue.4, pp.722-737, 2015. ,
On acoustic surveillance of hazardous situations, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp.165-168, 2009. ,
A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, pp.1345-1359, 2010. ,
Recurrent neural networks for polyphonic sound event detection in real life recordings, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. ,
The timbre toolbox: Extracting audio descriptors from musical signals, The Journal of the Acoustical Society of America, vol.130, issue.5, pp.2902-2916, 2011. ,
Automatically selecting signal descriptors for sound classification, 2002. ,
URL : https://hal.archives-ouvertes.fr/hal-01161323
Online real-time crowd behavior detection in video sequences, Computer Vision and Image Understanding, vol.144, pp.166-176, 2016. ,
Random regression forests for acoustic event detection and classification, IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol.23, issue.1, pp.20-31, 2015. ,
Shout detection in noise, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp.4968-4971, 2011. ,
Detection of shouted speech in the presence of ambient noise, Interspeech, pp.2621-2624, 2011. ,
Acoustic modeling using deep belief networks, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.1, pp.14-22, 2012. ,
Understanding how deep belief networks perform acoustic modelling, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4273-4276, 2012. ,
A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, vol.77, issue.2, pp.257-286, 1989. ,
The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, pp.65-386, 1958. ,
Audio events detection in public transport vehicle, Intelligent Transportation Systems Conference, pp.733-738, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00664991
Learning representations by back-propagating errors, Nature, vol.323, pp.533-536, 1986. ,
, Parallel distributed processing: Explorations in the microstructure of cognition, vol.1, pp.318-362, 1986.
Convolutional, long short-term memory, fully connected deep neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4580-4584, 2015. ,
Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.26, issue.1, pp.43-49, 1978. ,
Feature learning with deep scattering for urban sound analysis, European Signal Processing Conference, pp.729-733, 2015. ,
Video anomaly detection based on local statistical aggregates, IEEE Conference on Computer Vision and Pattern Recognition, pp.2112-2119, 2012. ,
, Deep learning in neural networks: An overview, vol.61, pp.85-117, 2015.
Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014. ,
The clear 2007 evaluation, 2007. ,
An introduction to audio content analysis: Applications in signal processing and music informatics, Computer Music Journal, vol.37, issue.4, pp.90-91, 2013. ,
Computers in the human interaction loop, ch. Acoustic Event Detection and Classification, pp.61-73, 2009. ,
Scream and gunshot detection and localization for audio-surveillance systems, IEEE Conference on Advanced Video and Signal Based Surveillance, pp.21-26, 2007. ,
Gammatone wavelet features for sound classification in surveillance applications, European Signal Processing Conference, pp.1658-1662, 2012. ,
Sequence to sequence -video to text, IEEE International Conference on Computer Vision (ICCV), pp.4534-4542, 2015. ,
, Proceedings of the detection and classification of acoustic scenes and events 2016 workshop, 2016.
Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation, IEEE Trans. on Automation Science and Engineering, vol.11, issue.2, pp.607-613, 2014. ,
A first attempt at polyphonic sound event detection using connectionnist temporal classification, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2017. ,
Acoustic scene recognition with deep neural networks (DCASE challenge, DCASE2016 Challenge, 2016. ,
Backpropagation through time: what it does and how to do it, vol.78, pp.1550-1560, 1990. ,
Surveillance robot utilizing video and audio information, J. of Intelligent and Robotic Systems, vol.55, issue.4, pp.403-421, 2009. ,
A brief survey on sequence classification, SIGKDD Explor. Newsl, vol.12, issue.1, pp.40-48, 2010. ,
Abnormal crowd behavior detection based on the energy model, IEEE International Conference on Information and Automation, pp.495-500, 2011. ,
Speech recognition evaluation: a review of the u.s. csr and lvcsr programmes, Computer Speech and Language, vol.12, issue.4, pp.263-279, 1998. ,
Neural networks for classification: a survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.30, issue.4, pp.451-462, 2000. ,
Deep belief networks based voice activity detection, IEEE Trans. on Audio, Speech, and Language Processing, vol.21, issue.4, pp.697-710, 2013. ,
revised selected papers, ch. HMM-Based Acoustic Event Detection with AdaBoost Feature Selection, pp.345-353, 2007. ,
Embedded security system for multi-modal surveillance in a railway carriage, SPIE security and defence, vol.9652, pp.9652-9663, 2015. ,