Skip to Main content Skip to Navigation
Conference papers

The multidialectal corpus of the Crescent dialects: collection, exploitation and analysis

Abstract : Situated on northern Limousin and Auvergne (France), the linguistic Crescent is an area where local gallo-romance varieties simultaneously display typical Oïlic and Occitan features. One of the main aims of our research projects is to collect, exploit and analyse a multidialectal corpus before these varieties, now highly endangered, fall into oblivion. We also want to make the corpus accessible to both local coummunities and researchers. This corpus mainly contains raw linguistic data, either written or spoken: lexical items, morphological paradigms, original texts (belonging to various genres), translations (in particular of “The Little Prince”), audiobooks. All these elements are associated with metadata providing information about the informant and the context in which the data were collected. The corpus has already been used for different kinds of work: grammatical descriptions, linguistic maps (for linguistic comparisons and variational approaches), morphological analysis (hierarchical clustering), phonetic comparison (mel-frequency cepstral coefficients), etc. Our corpus have some implications for language sciences: it provides a considerable amount of data for under-described linguistic varieties, a large typological parallel corpus, and new elements for romance linguistics. We now intend to develop it, through the documentation of new varieties, the translation of more versions of “The Little Prince”, the recording of more audiobooks, the creation of more online tools, etc. We also intend to preserve the online corpus on a more durable archive, namely Cocoon (Digital Oral Corpus COllections).
Document type :
Conference papers
Complete list of metadata
Contributor : Maximilien Guérin Connect in order to contact the contributor
Submitted on : Wednesday, June 1, 2022 - 9:51:40 PM
Last modification on : Friday, June 3, 2022 - 3:12:55 AM


  • HAL Id : hal-03685122, version 1


Maximilien Guérin. The multidialectal corpus of the Crescent dialects: collection, exploitation and analysis. CLARIN café on Bilingual and Multilingual Corpora, Common Language Resources and Technology Infrastructure, Apr 2022, On-line, France. ⟨hal-03685122⟩



Record views