Phonetic corpora and big data
Résumé
During the last years, 'big data' has emerged as a trendy, highly promising portmanteau term in economics and high-tech domains, such as information technology and speech processing. Big data are often described using a 3V scheme: volume, variety, velocity: a huge volume of data, a large variety of possibly unstructured, heterogeneous data sources, a high frequency or velocity of data generation over time. In this Glasgow ICPhS 2015 discussant session, we will question the 'big data' term with respect to phonetics and speech sciences at large. In this context, big data typically refer to huge, generally unstructured collections of speech or audio-visual data, pre-existing any phoneticians' investigation hypotheses. Can such data become beneficial to phonetic sciences?