, Diagnostic and Statistical Manual of mental disorders, 2013.

S. Abdullah, M. Matthews, E. Frank, G. Doherty, G. Gay et al., Automatic detection of social rhythms in bipolar disorder, Journal of the American Medical Informatics Association, vol.23, issue.1, pp.538-543, 2016.

S. Amiriparian, N. Cummins, S. Ottl, M. Gerczuk, and B. Schuller, Sentiment analysis using image-based deep spectrum features, Proceedings of the 2nd International Workshop on Automatic Sentiment Analysis in the Wild (WASA), held in conjunction with the 7th biannual Conference on Affective Computing and Intelligent Interaction, 2017.

M. Shahin-amiriparian, S. Gerczuk, N. Ottl, M. Cummins, S. Freitag et al., Snore sound classification using image-based deep spectrum features, Proceedings of INTER-SPEECH 2017, 18th Annual Conference of the International Speech Communication Association. ISCA, pp.3512-3516, 2017.

A. Baird, S. Amiriparian, N. Cummins, A. M. Alcorn, A. Batliner et al., Automatic classification of autistic child vocalisations: A novel database and results, Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association. ISCA, pp.849-853, 2017.

T. Baltru?aitis, P. Robinson, and L. Morency, OpenFace: An open source facial behavior analysis toolkit, Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), vol.10, 2016.

T. Baltru?aitis, C. Ahuja, and L. Morency, Multimodal machine learning: A survey and taxonomy, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, 2018.

I. E. Bauer, J. C. Soares, S. Selek, and T. D. Meyer, The link between refractoriness and neuroprogression in treatment-resistant bipolar disorder, Neuroprogression in Psychiatric Disorders. Mod Trends, vol.31, pp.10-26, 2017.

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.4, pp.1798-1828, 2013.

M. Brandon, K. Bootha, S. S. Mundnicha, and . Narayanana, A novel method for human bias correction of continuous-time annotations, Proceedings of the 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018.

E. Çiftçi, H. Kaya, H. Güleç, and A. A. Salah, The Turkish audio-visual bipolar disorder corpus, Proceedings of the 1st Asian Conference on Affective Computing and Intelligent Interaction, 2018.

A. Ciprian, M. O. Corneanu, J. F. Simón, S. E. Cohn, and . Guerrero, Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, pp.1548-1568, 2016.

R. Cowie, E. Douglas-cowie, S. Savvidou, E. Mcmahon, M. Sawey et al., Feeltrace: An instrument for recording perceived emotion in real time, ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, pp.19-24, 2000.

N. Cummins, S. Amiriparian, G. Hagerer, A. Batliner, S. Steidl et al., An image-based deep spectrum feature representation for the recognition of emotional speech, Proceedings of the 25th ACM International Conference on Multimedia, pp.478-484, 2017.

N. Cummins, S. Amiriparian, S. Ottl, M. Gerczuk, M. Schmitt et al., Multimodal Bag-of-Words for cross domains sentiment analysis, Proceedings of the 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.

J. Deng, N. Cummins, M. Schmitt, K. Qian, F. Ringeval et al., Speech-based diagnosis of autism spectrum condition by generative adversarial network representations, Proceedings of the 7th International Conference on Digital Health (DH), pp.53-57, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02080880

S. K. and J. Kory, A review and meta-analysis of multimodal affect detection systems, ACM Computing Surveys, vol.47, issue.3, 2015.

P. Ekman, Universals and cultural differences in facial expressions of emotion, Nebraska Symposium on motivation, vol.19, pp.207-283, 1971.

H. Anger-elfenbein and N. Ambady, On the universality and cultural specificity of emotion recognition: A meta-analysis, Psychological Bulletin, vol.128, pp.203-235, 2002.

A. Esposito, A. M. Esposito, and C. Vogel, Needs and challenges in human computer interaction for processing social emotional information, Pattern Recognition Letters, vol.66, pp.41-51, 2015.

F. Eyben, K. R. Scherer, W. Björn, J. Schuller, E. Sundberg et al., The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for voice research and affective computing, IEEE Transactions on Affective Computing, vol.7, issue.2, pp.190-202, 2016.

F. Eyben, F. Weninger, F. Groß, and B. Schuller, Recent developments in openSMILE, the Munich open-source multimedia feature extractor, Proceedings of the 21st ACM International Conference on Multimedia, pp.835-838, 2013.

F. Eyben, F. Weninger, S. Squartini, and B. Schuller, Reallife voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies, Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing, 2013.

K. Rong-en-fan, C. Chang, X. Hsieh, C. Wang, and . Lin, LIBLINEAR: A library for large linear classification, The Journal of Machine Learning Research, vol.9, pp.1871-1874, 2008.

M. Faurholt-jepsen, J. Busk, M. Frost, M. Vinberg, E. M. Christensen et al., Voice analysis as an objective state marker in bipolar disorder, Transactional Psychiatry, vol.6, p.856, 2016.

F. Zhou and F. De-la-torre, Canonical time warping for alignment of human behavior, Proceedings of the 23rd Annual Conference on Advances in Neural Information Processing Systems (NIPS). Neural Information Processing Systems Foundation, 2009.

F. Zhou and F. De-la-torre, Generalized time warping for multimodal alignment of human motion, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1282-1289, 2012.

D. Silvia-monica-feraru, B. Schuller, and . Schuller, Cross-language acoustic emotion recognition: An overview and some tendencies, Proceedings of the 6th biannual Conference on Affective Computing and Intelligent Interaction (ACII), pp.125-131, 2015.

E. Frank, I. Soreca, H. A. Swartz, A. M. Fagiolini, A. G. Mallinger et al., The role of interpersonal and social rhythm therapy in improving occupational functioning in patients with bipolar I disorder, The American Journal of Psychiatry, vol.165, pp.1559-1565, 2008.

S. Ghosh, E. Laksana, L. Morency, and S. Scherer, Representation learning for speech emotion recognition, Proceedings of INTER-SPEECH 2016, 17th Annual Conference of the International Speech Communication Association. ISCA, pp.3603-3607, 2016.

R. Gravina, P. Alinia, H. Ghasemzadeh, and G. Fortino, Multi-sensor fusion in body sensor networks: State-of-the-art and research challenges, Information Fusion, vol.35, pp.68-80, 2017.

M. Grimm and K. Kroschel, Evaluation of natural emotions using self assessment manikins, Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.381-385, 2005.

R. Gupta, K. Audhkhasi, Z. Jacokes, A. Rozga, and S. Narayanan, Modeling multiple time series annotations as noisy distortions of the ground truth: An Expectation-Maximization approach, IEEE Transactions on Affective Computing, vol.9, pp.76-89, 2018.

J. Han, Z. Zhang, M. Schmitt, M. Pantic, and B. Schuller, From hard to soft: Towards more human-like emotion recognition by modelling the perception uncertainty, Proceedings of the 25th ACM International Conference on Multimedia, pp.890-897, 2017.

L. He, E. Pei, D. Jiang, P. Wu, L. Yang et al., Multimodal affective dimension prediction using Deep Bidirectional Long ShortTerm Memory Recurrent Neural Networks, Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge (AVEC), co-located with the 23rd ACM International Conference on Multimedia, pp.73-80, 2015.

Z. Huang, N. Cummins, T. Dang, B. Stasak, P. Le et al., An investigation of annotation delay compensation and outputassociative fusion for multimodal continuous emotion prediction, Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge (AVEC), co-located with the 23rd ACM International Conference on Multimedia, 2015.

Z. Huang and J. Epps, Prediction of emotion change from speech, Frontiers in ICT, vol.5, 2018.

D. John and . Hunter, Matplotlib: A 2D graphics environment, Computing in Science & Engineering, vol.9, pp.90-95, 2007.

N. Zahi, E. M. Karam, S. Provost, J. Singh, C. Montgomery et al., Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech, Proceedings of the 39th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.4858-4862, 2014.

H. Kaya and A. A. Karpov, Efficient and effective strategies for cross-corpus acoustic emotion recognition, Neurocomputing, vol.275, pp.1028-1062, 2018.

S. Khorram, J. Gideon, M. Mcinnis, and E. M. Provost, Recognition of depression in bipolar disorder: Leveraging cohort and personspecific knowledge, Proceedings of INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association. ISCA, pp.1215-1219, 2016.

A. M. Kilbourne, D. Goodrich, D. J. Miklowitz, K. Austin, E. P. Post et al., Characteristics of patients with bipolar disorder managed in VA primary care or specialty mental health care settings, Psychiatric Services, vol.61, pp.500-507, 2010.

A. M. Kilbourne, D. E. Goodrich, A. N. O'donnell, and C. J. Miller, Integrating bipolar disorder management in primary care, Current Psychiatry Reports, vol.14, issue.6, pp.687-695, 2012.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep Convolutional Neural Networks, Proceedings of the 26th Annual Conference on Advances in Neural Information Processing Systems (NIPS), vol.25, pp.1097-1105, 2012.
DOI : 10.1145/3065386

URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

L. Li, A concordance correlation coefficient to evaluate reproducibility, Biometrics, vol.45, pp.255-268, 1989.

P. Lopes, G. N. Yannakakis, and A. Liapis, RankTrace: Relative and unbounded affect annotation, Proceedings of the 7th biannual Conference on Affective Computing and Intelligent Interaction (ACII), pp.158-163, 2017.
DOI : 10.1109/acii.2017.8273594

S. Mariooryad and C. Busso, Correcting time-continuous emotional labels by modeling the reaction lag of evaluators, IEEE Transactions on Affective Computing, vol.6, pp.97-108, 2015.

A. Mencattini, F. Mosciano, M. C. Comes, T. D. Gregorio, G. Raguso et al., An emotional modulation model as signature for the identification of children developmental disorders, Scientific Reports, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01993360

K. R. Merikangas, M. Ames, L. Cui, P. E. Stang, T. Bedirhan-ustun et al., The impact of comorbidity of mental and physical conditions on role disability in the US adult household population, Archives of General Psychiatry, vol.64, pp.1180-1188, 2007.

A. George and . Miller, The magical number seven, plus or minus two: Some limits on our capacity for processing information, Psychological Review, vol.63, issue.2, pp.81-97, 1956.

M. Müller, Dynamic time warping. In Information retrieval for music and motion, pp.69-86, 2007.

A. Mihalis, V. Nicolaou, M. Pavlovic, and . Pantic, Dynamic probabilistic CCA for analysis of affective behavior and fusion of continuous annotations, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, pp.1299-1311, 2014.

J. Nicolle, V. Rapp, K. Bailly, L. Prevost, and M. Chetouani, Robust continuous prediction of human emotions using multiscale dynamic cues, Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI), pp.501-508, 2012.

, Mental disorders affect one in four people, 2001.

, The global burden of disease: 2004 update, Table A2: Burden of disease in DALYs by cause, sex and income group in WHO regions, estimates for, 2004.

M. Pantic, N. Sebe, J. F. Cohn, and T. Huang, Affective Multimodal Human-computer Interaction, Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA), pp.669-676, 2005.

S. Reilly, C. Planner, M. Hann, D. Reeves, I. Nazareth et al., The role of primary care in service provision for people with severe mental illness in the United kingdom, PLoS One, 2012.

F. Ringeval, F. Eyben, E. Kroupi, A. Yuce, J. Thiran et al., Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data, Pattern Recognition Letters, vol.66, pp.22-30, 2015.

F. Ringeval, B. Schuller, M. Valstar, R. Cowie, and M. Pantic, Summary for AVEC 2017 -Real-life depression and affect challenge and workshop, Proceedings of the 25th ACM International Conference on Multimedia, pp.1963-1964, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02080833

F. Ringeval, B. Schuller, M. Valstar, J. Gratch, R. Cowie et al., AVEC 2017 -Real-life depression, and affect recognition workshop and challenge, Proceedings of the 7th International Workshop on Audio/Visual Emotion Challenge (AVEC), co-located with the 25th ACM International Conference on Multimedia, pp.3-9, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02080874

F. Ringeval, B. Schuller, M. Valstar, S. Jaiswal, E. Marchi et al., AV+EC 2015 -The first affect recognition challenge bridging across audio, video, and physiological data, Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge (AVEC), co-located with the ACM International Conference on Multimedia, pp.3-8, 2015.

F. Ringeval, A. Sonderegger, B. Noris, A. Billard, J. Sauer et al., On the influence of emotional feedback on emotion awareness and gaze behavior, Proceedings of the 5th biannual Conference on Affective Computing and Intelligent Interaction (ACII), pp.448-453, 2013.

F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions, Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with the 10th International IEEE Conference on Automatic Face and Gesture Recognition (FG), vol.8, 2013.

J. A. Russell, A circumplex model of affect, Journal of Personality and Social Psychology, vol.39, issue.6, pp.1161-1178, 1980.
URL : https://hal.archives-ouvertes.fr/hal-01086372

A. James and . Russell, Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies, Psychological Bulletin, vol.115, pp.102-141, 1994.

H. Sagha, J. Deng, M. Gavryukova, J. Han, B. Schuller et al., Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace, Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.5800-5804, 2016.
DOI : 10.1109/icassp.2016.7472789

K. R. Scherer, R. Banse, and H. G. Wallbott, Emotion inferences from vocal expression correlate across languages and cultures, Journal of CrossCultural Psychology, vol.32, issue.1, pp.76-92, 2001.
DOI : 10.1177/0022022101032001009

URL : https://archive-ouverte.unige.ch/unige:102078/ATTACHMENT01

M. Schmitt, F. Ringeval, and B. Schuller, At the border of acoustics and linguistics: Bag-of-Audio-Words for the recognition of emotions in speech, Proceedings of INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association. ISCA, pp.495-499, 2016.

M. Schmitt and B. Schuller, 2017. openXBOW -Introducing the Passau open-source crossmodal Bag-of-Words toolkit, Journal of Machine Learning Research, vol.18, pp.1-5, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01580190

B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer et al., The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social signals, conflict, emotion, autism, Proceedings of INTER-SPEECH 2013, 14th Annual Conference of the International Speech Communication Association. ISCA, pp.148-152, 2013.

B. Schuller, M. Valstar, F. Eyben, R. Cowie, and M. Pantic, AVEC 2012 -The continuous Audio/Visual Emotion Challenge, Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI), pp.449-456, 2012.
DOI : 10.1007/978-3-642-24571-8_53

URL : https://pure.qub.ac.uk/portal/files/1710344/2011_Schuller.pdf

B. Schuller, M. Valstar, F. Eyben, G. Mckeown, R. Cowie et al., AVEC 2011 -The First International Audio/Visual Emotion Challenge, Proceedings of the 4th biannual International Conference on Affective Computing and Intelligent Interaction (ACII), pp.415-424, 2011.

M. Soroosh and C. Busso, Analysis and compensation of the reaction lag of evaluators in continuous emotional annotations, Proceedings of the 5th biannual International Conference on Affective Computing and Intelligent Interaction (ACII), pp.85-90, 2013.

N. Thammasan, K. Fukui, and M. Numao, An investigation of annotation smoothing for EEG-based continuous music-emotion Recognition, Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2016.

G. Trigeorgis, A. Mihalis, B. Nicolaou, S. Schuller, and . Zafeiriou, Deep Canonical Time Warping for simultaneous alignment and representation learning of sequences, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, pp.1128-1138, 2018.

G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou et al., Adieu features? End-toend speech emotion recognition using a deep Convolutional Recurrent Network, Proceedings of the 41st International Conference on Acoustics, Speech, and Signal Processing, pp.5200-5204, 2016.

A. Tversky, Intransitivity of preferences, Psychological Review, vol.76, pp.31-48, 1969.

M. Valstar, J. Gratch, B. Schuller, F. Ringeval, R. Cowie et al., Summary for AVEC 2016: Depression, mood, and emotion recognition workshop and challenge, Proceedings of the 24th ACM International Conference on Multimedia, pp.1483-1484, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01494127

M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne et al., AVEC 2016 -Depression, mood, and emotion recognition workshop and challenge, Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge (AVEC), co-located with the ACM International Conference on Multimedia, pp.3-10, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01494127

M. Valstar, B. Schuller, J. Krajewski, R. Cowie, and M. Pantic, Workshop summary for the 3rd international Audio/Visual Emotion Challenge and workshop, Proceedings of the 21st ACM International Conference on Multimedia, pp.1085-1086, 2013.

M. Valstar, B. Schuller, J. Krajewski, R. Cowie, and M. Pantic, AVEC 2014: The 4th international Audio/Visual Emotion Challenge and workshop, Proceedings of the 22nd ACM International Conference on Multimedia, pp.1243-1244, 2014.

L. Van-der-maaten and K. Weinberger, Stochastic triplet embedding, Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2012.

F. Weninger, F. Ringeval, E. Marchi, and B. Schuller, Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio, Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI). IJCAI/AAAI, pp.2196-2202, 2016.
DOI : 10.1109/globalsip.2014.7032183

N. Georgios, R. Yannakakis, C. Cowie, and . Busso, The ordinal nature of emotions, Proceedings of the 7th biannual Conference on Affective Computing and Intelligent Interaction (ACII), vol.8, 2017.

R. C. Young, J. T. Biggs, V. E. Ziegler, and D. A. Meyer, A rating scale for mania: Reliability, validity and sensitivity, The British Journal of Psychiatry, vol.133, pp.429-435, 1978.
DOI : 10.1192/bjp.133.5.429

B. Zhang, E. M. Provost, and G. Essl, Cross-corpus acoustic emotion recognition with multi-task learning: Seeking common ground while preserving differences, IEEE Transactions on Affective Computing, vol.14, 2017.
DOI : 10.1109/taffc.2017.2684799

Z. Zhang, N. Cummins, and B. Schuller, Advanced data exploitation in speech analysis -An overview, IEEE Signal Processing Magazine, vol.34, issue.4, pp.107-129, 2017.