The Nijmegen Corpus of Casual French - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Speech Communication Année : 2010

The Nijmegen Corpus of Casual French

Francisco Torreira
  • Fonction : Auteur correspondant
  • PersonId : 905464

Connectez-vous pour contacter l'auteur
Martine Adda-Decker
Mirjam Ernestus
  • Fonction : Auteur

Résumé

This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual French (NCCFr). The corpus contains a total of over 36 hours of recordings of 46 French speakers engaged in conversations with friends. Casual speech was elicited during three different parts, which together provided around ninety minutes of speech from every pair of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. Comparisons with the ESTER corpus of journalistic speech show that the two corpora contain speech of considerably different registers. A number of indicators of casualness, including swear words, casual words, , disfluencies and word repetitions, are more frequent in the NCCFr than in the ESTER corpus, while the use of double negation, an indicator of formal speech, is less frequent. In general, these estimates of casualness are constant through the three parts of the recording sessions and across speakers. Based on these facts, we conclude that our corpus is a rich resource of highly casual speech, and that it can be effectively exploited by researchers in language science and technology.

Mots clés

Fichier principal
Vignette du fichier
PEER_stage2_10.1016%2Fj.specom.2009.10.004.pdf (1.09 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00608402 , version 1 (13-07-2011)

Identifiants

Citer

Francisco Torreira, Martine Adda-Decker, Mirjam Ernestus. The Nijmegen Corpus of Casual French. Speech Communication, 2010, 52 (3), pp.201. ⟨10.1016/j.specom.2009.10.004⟩. ⟨hal-00608402⟩

Collections

PEER
260 Consultations
267 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More