Identification of lexical areas templates throughout the Occitan domain

Guylaine Brun-Trigaud 1 Clément Chagnaud 2 Philippe Garat 3
1 BCL, équipe Dialectologie et Linguistique formelle
BCL - Bases, Corpus, Langage (UMR 7320 - UNS / CNRS)
2 LIG Laboratoire d'Informatique de Grenoble - STEAMER
LIG - Laboratoire d'Informatique de Grenoble
3 SVH - Statistique pour le Vivant et l’Homme
LJK - Laboratoire Jean Kuntzmann
Abstract : The project presented here began several years ago. Its purpose is to show the relationship between the limits of lexical areas and the limits of dialectal areas traditionally defined in the Occitan domain. Based on data from regional language atlases implemented in the Thesaurus Occitan (, a first study (Brun-Trigaud 2013) enabled the identification of several types of areal layouts, as well as 8 templates in which a large number of Occitan lexical areas fits. This starting point had led us to question how to name the entities that do not fit in the traditional dialectal template. On an extended database, the work was resumed (Brun-Trigaud 2017) and processed in Gabmap ( where analyses by multidimensional scaling and Fuzzy Clustering confirmed the results of our first approach. Within the ANR project named ECLATS (, other methods of multidimensional spatial analyses are currently experimented, allowing the exploration of the lexical data from a complementary point of view. The proposed analytical procedure first computes the various lexical areas associated to each of the lexical entries. The lexical areas are stored as spatial objects and become the statistical units of the subsequent spatial analyses, contrary to the dialectometric approach that considers survey points. The identification of recurrent areal layouts throughout the different lexical areas is performed by computing pairwise concordance indicators that measure whether two given spatial partitions present independent spatial distributions or high spatial concordances. Several concordance indicators have been tested and compared. Using various classification algorithms, we are able to define stable clusters, each one giving out a specific layout template of lexical areas. We also propose a new methodology of spatial analyses in order to cluster the lexical areas according to historical or geographic criteria (old provincial areas, mountain ranges, forest areas (…) and detect spatial synchrony between them. The following steps describe the process: (i) Discretization of our studied territory into a lattice of spatial units or cells (ii) Projection of these spatial units in a synthetic low dimensional representation space, using Correspondence Analysis or Multiple Correspondence Analysis (Rencher 2002). The factorial axes are spanned according to the chosen historical or geographic criteria (iii) Plotting of the lexical areas on the synthetic space as barycentric centroids (iv) Clustering the lexical barycentric centroids (v) Reconstruction of the templates using a reverse process. Recent tools in spatial analyses of ECLATS project open new perspectives in the field of dialectology regarding the classification of lexical areas into spatial templates. References Brun-Trigaud, G. & Malfatto, A. (2013). Limites dialectales vs limites lexicales dans le domaine occitan: un impossible accord?. In Carrilho, E. (éd.): Current Approaches to Limits and Areas in Dialectology: 293-310. Cambridge Scholars Publishers. Brun-Trigaud, G., Malfatto, A. & Sauzet, M. (2017): Essai de typologie des aires lexicales occitanes: regards dialectométriques. Fidélités et dissidences. Actes du 12e Congrès de l’Association Internationale d'Etudes Occitanes (Albi, 2017). (to be published) Rencher, Alvin C. (2002) Methods of multivariate analysis. Wiley-Interscience, Hoboken, NJ, USA, second edition.
Communication dans un congrès
10th International Conference on Language Variation in Europe (ICLaVE|10), Jun 2019, Leeuwarden, Netherlands
Guylaine Brun-Trigaud, Clément Chagnaud, Philippe Garat. Identification of lexical areas templates throughout the Occitan domain. 10th International Conference on Language Variation in Europe (ICLaVE|10), Jun 2019, Leeuwarden, Netherlands.



