Discovering dialectal differences based on oral corpora - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Discovering dialectal differences based on oral corpora

Vasilisa Andriyanets
  • Fonction : Auteur
Michael Daniel
  • Fonction : Auteur
Brigitte Pakendorf

Résumé

This paper discusses a method to detect statistically significant linguistic differences between corpora while factoring in possible variability within the very corpora to be compared. Specifically, we compare two small corpora of dialects of Even, Bystraja and Lamunkhin Even, in an attempt to identify morphemes that are more frequent in either of the corpora. To investigate whether this difference might be due to an over-representation of a speaker who happens to be an outlier in terms of using a particular morpheme, we use DP, a measurement of evenness of the distribution of a specific linguistic feature across subcorpora of the same corpus.

Domaines

Linguistique
Fichier principal
Vignette du fichier
Andriyanets_etal_2018_Even_corpus_DIALOG.pdf (900.88 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-01960505 , version 1 (16-07-2020)

Identifiants

  • HAL Id : hal-01960505 , version 1

Citer

Vasilisa Andriyanets, Michael Daniel, Brigitte Pakendorf. Discovering dialectal differences based on oral corpora. Computational Linguistics and Intellectual Technologies, 2018, Moscow, Russia. pp.24-34. ⟨hal-01960505⟩
61 Consultations
16 Téléchargements

Partager

Gmail Facebook X LinkedIn More