. M. Bender-e and . Friedman-b, Data statements for natural language processing : Toward mitigating system bias and enabling better science, Transactions of the Association for Computational Linguistics, vol.6, pp.587-604, 2018.

. Bolukbasi-t, . Chang-k.-w, J. Y. Zou, and . T. Saligrama-v.-&-kalai-a, Man is to computer programmer as woman is to homemaker ? Debiasing word embeddings, Actes de NeurIPS, pp.4349-4357, 2016.

J. Buolamwini and . Gebru-t, Gender shades : Intersectional accuracy disparities in commercial gender classification, Actes de FAT 2018 (Fairness, Accountability and Transparency), pp.77-91, 2018.

. Cai-l.-&-zhu-y, The challenges of data quality and data quality assessment in the big data era, Data science Journal, p.14, 2015.

. Caliskan-a, J. J. Bryson, and . Narayanan-a, Semantics derived automatically from language corpora contain human-like biases, Science, vol.356, issue.6334, pp.183-186, 2017.

. Couillault-a, . Fort-k, and . Adda-g.-&-mazancourt-h, Evaluating corpora documentation with regards to the ethics and big data charter, Actes de LREC 2014 (Language Resources and Evaluation), p.251, 2014.

. Doukhan-d.-&-carrive-j, Description automatique du taux d'expression des femmes dans les flux télévisuels français, Actes de JEP 2018 (Journées d'Études sur la Parole), pp.496-504, 2018.

M. Garnerin and . Rossato-s.-&-besacier-l, Gender representation in French broadcast corpora and its impact on ASR performance, Actes de AI4TV 2019 (Workshop on AI for Smart TV Content Production, Access and Delivery), pp.3-9, 2019.

. Google, Crowdsourced high-quality UK and Ireland English Dialect speech data set, 2019.

. Hernandez-f, . Nguyen-v, S. Ghannay, and . Tomashenko-n.-&-estève-y, TED-LIUM 3 : Twice as much data and corpus repartition for experiments on speaker adaptation, Actes de SPECOM 2018 (Speech and Computer), pp.198-208, 2018.

. D. Hernandez-mena-c, TEDx spanish corpus. audio and transcripts in spanish taken from the tedx talks, 2019.

. Hovy-d and . L. Spruit-s, The social impact of Natural Language Processing, Actes de ACL 2016, vol.2, pp.591-598, 2016.

J. S. Besacier and L. Lecouteux-b.-&-dyab-m, Using resources from a closelyrelated language to develop ASR for a very under-resourced language : a case study for Iban, Actes de INTERSPEECH 2015 (International Speech Communication Association, pp.1270-1274, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01170493

M. Korvas, O. Plátek, O. Du?ek, and . ?ilka-l.-&-jur?í?ek-f, Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license, Actes de LREC 2014 (Language Resources and Evaluation), pp.4423-4428, 2014.

. Macharia-s, L. Ndangam, M. Saboor, E. Franke, and . Parr-s.-&-opoku-e, Who makes the news, Global Media Monitoring Project (GMMP), 2015.

M. M. Wu-s, . Zaldivar-a, . Barnes-p, L. Vasserman, . Hutchinson-b et al., Model cards for model reporting, Actes de FAT 2019 (Fairness, Accountability and Transparency), pp.220-229, 2019.

. Nass-c and . Brave-s, Wired for Speech : How Voice Activates and Advances the Humancomputer Relationship, 2005.

. Panayotov-v, . Chen-g, and . Povey-d.-&-khudanpur-s, Librispeech : an ASR corpus based on public domain audio books, Actes de ICASSP 2015 (Acoustics, Speech and Signal Processing, pp.5206-5210, 2015.

S. J. Thorne-b, The missing feminist revolution in sociology, Social problems, vol.32, issue.4, pp.301-316, 1985.

. Vanmassenhove-e and . Harmeier-c.-&-way-a, Getting gender right in neural machin translation, Actes de EMNLP 2018 (Empirical Methods in Natural Language Processing, pp.3003-3008, 2018.

. West-m, . Kraut-r.-&-ei, and . Chew-h, I'd blush if I could : closing gender divides in digital skills through education, 2019.

. D. Wilkinson-m, M. Dumontier, . J. Aalbersberg-i, . Appleton-g, M. Axton et al., The FAIR guiding principles for scientific data management and stewardship, p.3, 2016.