T. Iv, A. Events, . Highest, and . In,

E. Audioset and -. , Audioset-At-30 -AP, (AUC) AP Drop (% Drop) Speech synthesizer 0, p.506

, Air horn truck horn 0, p.310

, Vehicle horn, honking 0, pp.64-913

, Chopping, vol.285

, TABLE VIII BEST 10 AND WORST 10 PERFORMING EVENTS (ORDERED BY AP) FOR YOUTUBE-WILD Events (Best 10) YouTube-Wild Audioset-40 Events (worst 10) YouTube-Wild Audioset-40

, 961) Engine 0.048, p.316755

, 882) Violin, fiddle 0.041, (0.692) 0.446, (0.945) Chicken, rooster 0, Animal 0.289

, 728) 0.613476) 0.427, (0.925) Laughter 0.233, (0.881) 0.612, p.39

, REFERENCES

C. Clavel, T. Ehrette, and G. Richard, Events detection for an audiobased surveillance system, Multimedia and Expo, 2005. ICME 2005. IEEE International conference on, pp.1306-1309, 2005.
DOI : 10.1109/icme.2005.1521669

URL : http://perso.telecom-paristech.fr/~grichard/Publications/ICME05_clavel.pdf

D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, Detection and Classification of Acoustic Scenes and Events, IEEE Transactions on Multimedia, vol.17, issue.10, pp.1733-1746, 2015.
DOI : 10.1109/TMM.2015.2428998

URL : https://hal.archives-ouvertes.fr/hal-01123760

A. Mesaros, T. Heittola, and T. Virtanen, TUT database for acoustic scene classification and sound event detection, 2016 24th European Signal Processing Conference (EUSIPCO), pp.1128-1132, 2016.
DOI : 10.1109/EUSIPCO.2016.7760424

C. Zieger and M. Omologo, Acoustic event detection-itc-irst aed database, 2005.

K. J. Piczak, ESC, Proceedings of the 23rd ACM international conference on Multimedia, MM '15, pp.1015-1018, 2015.
DOI : 10.1145/2647868.2655045

J. Salamon, C. Jacoby, and J. P. Bello, A Dataset and Taxonomy for Urban Sound Research, Proceedings of the ACM International Conference on Multimedia, MM '14, pp.1041-1044, 2014.
DOI : 10.1145/1352012.1352015

A. Kumar and B. Raj, Audio Event Detection using Weakly Labeled Data, Proceedings of the 2016 ACM on Multimedia Conference, MM '16, pp.1038-1047, 2016.
DOI : 10.1016/j.patrec.2010.02.005

URL : http://arxiv.org/pdf/1605.02401

, Deep cnn framework for audio event recognition using weakly labeled web data, Machine Learning for Audio, 2017 NIPS Workshop on. NIPS, 2017.

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah et al., Dcase 2017 challenge setup: Tasks, datasets and baseline system, DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events, 2017.
DOI : 10.1109/taslp.2019.2907016

URL : https://hal.archives-ouvertes.fr/hal-01627981

J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence et al., Audio Set: An ontology and human-labeled dataset for audio events, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7952261

Z. Lu, Z. Fu, T. Xiang, P. Han, L. Wang et al., Learning from Weak and Noisy Labels for Semantic Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.3, pp.486-500, 2017.
DOI : 10.1109/TPAMI.2016.2552172

URL : https://repository.kaust.edu.sa/bitstream/10754/608585/1/07450177.pdf

J. Tang, S. Yan, R. Hong, G. Qi, and T. Chua, Inferring semantic concepts from community-contributed images and noisy tags, Proceedings of the seventeen ACM international conference on Multimedia, MM '09, pp.223-232, 2009.
DOI : 10.1145/1631272.1631305

Z. Feng, S. Feng, R. Jin, and A. K. Jain, Image Tag Completion by Noisy Matrix Recovery, European Conference on Computer Vision, pp.424-438, 2014.
DOI : 10.1007/978-3-319-10584-0_28

E. Wold, T. Blum, D. Keislar, and J. Wheaten, Content-based classification, search, and retrieval of audio, IEEE Multimedia, vol.3, issue.3, pp.27-36, 1996.
DOI : 10.1109/93.556537

G. Guo and S. Z. Li, Content-based audio classification and retrieval by support vector machines, IEEE transactions on Neural Networks, vol.14, issue.1, pp.209-215, 2003.

G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, Scream and gunshot detection and localization for audio-surveillance systems, " in Advanced Video and Signal Based Surveillance, pp.21-26, 2007.
DOI : 10.1109/avss.2007.4425280

URL : http://home.deib.polimi.it/tagliasa/publications/2007/AVSS2007_1_Tagliasacchi.pdf

L. Gerosa, G. Valenzise, M. Tagliasacchi, F. Antonacci, and A. Sarti, Scream and gunshot detection in noisy environments, Signal Processing Conference, pp.1216-1220, 2007.

J. Maxime, X. Alameda-pineda, L. Girin, and R. Horaud, Sound representation and classification benchmark for domestic robots, 2014 IEEE International Conference on Robotics and Automation (ICRA), pp.6285-6292, 2014.
DOI : 10.1109/ICRA.2014.6907786

URL : http://arxiv.org/pdf/1402.3689

C. Debes, A. Merentitis, S. Sukhanov, M. Niessen, N. Frangiadakis et al., Monitoring Activities of Daily Living in Smart Homes: Understanding human behavior, IEEE Signal Processing Magazine, vol.33, issue.2, pp.81-94, 2016.
DOI : 10.1109/MSP.2015.2503881

S. Greene, H. Thapliyal, and D. Carpenter, IoT-Based Fall Detection for Smart Home Environments, 2016 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS), pp.23-28, 2016.
DOI : 10.1109/iNIS.2016.017

Y. Zigel, D. Litvak, and I. Gannot, A Method for Automatic Fall Detection of Elderly People Using Floor Vibrations and Sound???Proof of Concept on Human Mimicking Doll Falls, IEEE Transactions on Biomedical Engineering, vol.56, issue.12, pp.2858-2867, 2009.
DOI : 10.1109/TBME.2009.2030171

Y. Li, Z. Zeng, M. Popescu, and K. Ho, Acoustic fall detection using a circular microphone array, Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, pp.2242-2245, 2010.

D. Stowell and M. D. Plumbley, Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning, PeerJ, vol.16, issue.4, p.488, 2014.
DOI : 10.7717/peerj.488/supp-1

URL : https://peerj.com/articles/488.pdf

X. Zhuang, X. Zhou, M. A. Hasegawa-johnson, and T. S. Huang, Real-world acoustic event detection, Pattern Recognition Letters, vol.31, issue.12, pp.1543-1551, 2010.
DOI : 10.1016/j.patrec.2010.02.005

S. Pancoast and M. Akbacak, Bag-of-audio-words approach for multimedia event classification, Thirteenth Annual Conference of the International Speech Communication Association, 2012.

H. Lim, M. J. Kim, and H. Kim, Robust sound event classification using lbp-hog based bag-of-audio-words feature representation, Sixteenth Annual Conference of the International Speech Communication Association, 2015.

X. Lu, Y. Tsao, S. Matsuda, and C. Hori, Sparse representation based on a bag of spectral exemplars for acoustic event detection, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6255-6259, 2014.
DOI : 10.1109/ICASSP.2014.6854807

A. Kumar, P. Dighe, R. Singh, S. Chaudhuri, and B. Raj, Audio event detection from acoustic unit occurrence patterns, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.489-492, 2012.
DOI : 10.1109/ICASSP.2012.6287923

URL : http://mlsp.cs.cmu.edu/people/rsingh/docs/eventdet.pdf

J. F. Gemmeke, L. Vuegen, P. Karsmakers, and B. Vanrumste, An exemplar-based NMF approach to audio event detection, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.1-4, 2013.
DOI : 10.1109/WASPAA.2013.6701847

T. Heittola, A. Mesaros, T. Virtanen, and A. Eronen, Sound event detection in multisource environments using source separation, Machine Listening in Multisource Environments, 2011.

K. J. Piczak, Environmental sound classification with convolutional neural networks, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp.1-6, 2015.
DOI : 10.1109/MLSP.2015.7324337

H. Zhang, I. Mcloughlin, and Y. Song, Robust sound event recognition using convolutional neural networks, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.559-563, 2015.
DOI : 10.1109/ICASSP.2015.7178031

URL : http://kar.kent.ac.uk/55020/1/cnn_mh.pdf

H. Phan, L. Hertel, M. Maass, and A. Mertins, Robust audio event recognition with 1-max pooling convolutional neural networks, " arXiv preprint, 2016.
DOI : 10.21437/interspeech.2016-123

URL : http://arxiv.org/pdf/1604.06338

S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen et al., CNN architectures for large-scale audio classification, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.131-135, 2017.
DOI : 10.1109/ICASSP.2017.7952132

URL : http://arxiv.org/pdf/1609.09430

Z. Zhou, Multi-instance learning: A survey, 2004.

S. Andrews, I. Tsochantaridis, and T. Hofmann, Support vector machines for multiple-instance learning, Advances in neural information processing systems, pp.577-584, 2003.

A. Kumar and B. Raj, Weakly supervised scalable audio content analysis, 2016 IEEE International Conference on Multimedia and Expo (ICME), pp.1-6, 2016.
DOI : 10.1109/ICME.2016.7552989

URL : http://arxiv.org/pdf/1606.03664

T. Su, J. Liu, and Y. Yang, Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.791-795, 2017.
DOI : 10.1109/ICASSP.2017.7952264

Y. Xu, Q. Kong, Q. Huang, W. Wang, and M. D. Plumbley, Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging, Interspeech 2017, 2017.
DOI : 10.21437/Interspeech.2017-486

URL : http://arxiv.org/pdf/1703.06052

A. Kumar, M. Khadkevich, and C. Fugen, Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and scenes, Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2018.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.
DOI : 10.1109/CVPR.2016.90

URL : http://arxiv.org/pdf/1512.03385

A. Kumar and B. Raj, Audio event and scene recognition: A unified approach using strongly and weakly labeled data, 2017 International Joint Conference on Neural Networks (IJCNN), pp.3475-3482, 2017.
DOI : 10.1109/IJCNN.2017.7966293

URL : http://arxiv.org/pdf/1611.04871

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th international conference on machine learning (ICML-10), pp.807-814, 2010.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

C. Buckley and E. Voorhees, Retrieval evaluation with incomplete information, Proceedings of the 27th annual international conference on Research and development in information retrieval , SIGIR '04, pp.25-32, 2004.
DOI : 10.1145/1008992.1009000

URL : http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/sigirBuckley2004.pdf

T. Fawcett, Roc graphs: Notes and practical considerations for researchers, Machine learning, vol.31, issue.1, pp.1-38, 2004.

Y. Wu and T. Lee, Reducing Model Complexity for DNN Based Large- Scale Audio Classification ArXiv e-prints, 2017.