Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers

Adrien Barbaresi

Communication Dans Un Congrès Année : 2018

Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers

(1, 2)

1
2

Adrien Barbaresi

Fonction : Auteur
PersonId : 1134
IdHAL : adrien-barbaresi
ORCID : 0000-0002-8079-8694

Osterreichische Akademie der Wissenschaften

Berlin-Brandenburgische Akademie der Wissenschaften

Résumé

The present contribution revolves around efficient approaches to language classification which have been field-tested in the Vardial evaluation campaign. The methods used in several language identification tasks comprising different language types are presented and their results are discussed, giving insights on real-world application of regularization, linear classifiers and corresponding linguistic features. The use of a specially adapted Ridge classifier proved useful in 2 tasks out of 3. The overall approach (XAC) has slightly outperformed most of the other systems on the DFS task (Dutch and Flemish) and on the ILI task (Indo-Aryan languages), while its comparative performance was poorer in on the GDI task (Swiss German dialects).

Domaines

Linguistique Informatique et langage [cs.CL] Machine Learning [stat.ML]

Fichier principal

Barbaresi_VarDial2018_regularized.pdf (168.72 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Adrien Barbaresi : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01858444

Soumis le : lundi 20 août 2018-17:19:35

Dernière modification le : mercredi 12 décembre 2018-13:32:04

Archivage à long terme le : mercredi 21 novembre 2018-14:20:09

Dates et versions

hal-01858444 , version 1 (20-08-2018)

Licence

Paternité

Identifiants

HAL Id : hal-01858444 , version 1

Citer

Adrien Barbaresi. Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers. Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, Aug 2018, Santa Fe, New Mexico, United States. pp.164-171. ⟨hal-01858444⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

58 Consultations

24 Téléchargements

Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Partager