The multidialectal corpus of the Crescent dialects: collection, exploitation and analysis - Laboratoire d'histoire des théories linguistiques Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

The multidialectal corpus of the Crescent dialects: collection, exploitation and analysis

Résumé

Situated on northern Limousin and Auvergne (France), the linguistic Crescent is an area where local gallo-romance varieties simultaneously display typical Oïlic and Occitan features. One of the main aims of our research projects is to collect, exploit and analyse a multidialectal corpus before these varieties, now highly endangered, fall into oblivion. We also want to make the corpus accessible to both local coummunities and researchers. This corpus mainly contains raw linguistic data, either written or spoken: lexical items, morphological paradigms, original texts (belonging to various genres), translations (in particular of “The Little Prince”), audiobooks. All these elements are associated with metadata providing information about the informant and the context in which the data were collected. The corpus has already been used for different kinds of work: grammatical descriptions, linguistic maps (for linguistic comparisons and variational approaches), morphological analysis (hierarchical clustering), phonetic comparison (mel-frequency cepstral coefficients), etc. Our corpus have some implications for language sciences: it provides a considerable amount of data for under-described linguistic varieties, a large typological parallel corpus, and new elements for romance linguistics. We now intend to develop it, through the documentation of new varieties, the translation of more versions of “The Little Prince”, the recording of more audiobooks, the creation of more online tools, etc. We also intend to preserve the online corpus on a more durable archive, namely Cocoon (Digital Oral Corpus COllections).
Fichier non déposé

Dates et versions

hal-03685122 , version 1 (01-06-2022)

Identifiants

  • HAL Id : hal-03685122 , version 1

Citer

Maximilien Guérin. The multidialectal corpus of the Crescent dialects: collection, exploitation and analysis. CLARIN café on Bilingual and Multilingual Corpora, Common Language Resources and Technology Infrastructure, Apr 2022, On-line, France. ⟨hal-03685122⟩
46 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More