AixOx, a multi-layered learners corpus: automatic annotation - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2014

AixOx, a multi-layered learners corpus: automatic annotation

Sophie Herment
Brigitte Bigi
Daniel J. Hirst
Anastassia Loukina
  • Fonction : Auteur

Résumé

This paper presents a multilingual learners corpus, AixOx, collect-ed in the framework of an Alliance project (a partnership between the British Council and The French Ministry of Foreign Affairs). The corpus consists of the recording of 40 1-minute passages in English and French from the Eurom 1 corpus (Chan et al., 1995), read by native speakers and L2 learners. French native speakers reading the French and English passages were recorded in Aix-en-Provence, and English native speakers reading the English and French passages were recorded in Oxford. The AixOx corpus con-tains about 40 hours of read speech and can be downloaded from the “Speech and Language Data Repository” (http://sldr.org). This paper also presents the tools used for automatic anno-tation on several layers using algorithms: •SPPAS –SPeech Phonetization Alignment and Syllabifica-tion– (Bigi, 2012) for a segmentation into utterances, words, syllables and phonemes; •MoMel –Modelling Melody– and INTSINT –INternational Transcription System for INTonation– (Hirst, 2007) for the modelling and coding of intonation. Finally, an example of a pedagogical application of the cor-pus is given: a pilot-study on the intonation of questions. We show how the AixOx corpus can be used to compare the produc-tions of natives with learners and how it is possible, thanks to the annotation, to understand the prosodic realisations (whether they be positive or negative) and explain them. We conclude that AixOx, with its multi-layered annotation, is a very rich oral data-base for all kinds of studies on L1 productions, L2 productions, language contact, both at the segmental and supra-segmental levels since it offers a phonemic segmentation and alignment and a pro-sodic labelling.

Domaines

Linguistique
Fichier principal
Vignette du fichier
Herment_et_al_PeterLang_FINAL_version HAL.pdf (1.72 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01363434 , version 1 (23-10-2019)

Identifiants

  • HAL Id : hal-01363434 , version 1

Citer

Sophie Herment, Anne Tortel, Brigitte Bigi, Daniel J. Hirst, Anastassia Loukina. AixOx, a multi-layered learners corpus: automatic annotation. Díaz Pérez J.; Díaz Negrillo A. Specialisation and variation in language corpora, Linguistic insights (179), Peter Lang, pp.41-76, 2014, 978-3035107135. ⟨hal-01363434⟩
332 Consultations
112 Téléchargements

Partager

Gmail Facebook X LinkedIn More