POS-tagging for Oral Texts with CRF and Category Decomposition - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Research in Computing Science Année : 2010

POS-tagging for Oral Texts with CRF and Category Decomposition

Résumé

The ESLO (Enquête sociolinguistique d'Orléans, i.e. Sociolinguistic Survey of Orléans) campaign gathered a large oral corpus, which was later transcribed into a text format. The purpose of this work is to assign morpho-syntactic labels to each unit of this corpus. To this end, we first studied the specificities of the labels required for oral data, and their various possible levels of description. This led to a new original hierarchical structure of labels. Then, since our new set of labels was different from any of those of existing taggers, which are usually not fit for oral data, we have built a new labelling tool using a Machine Learning approach. As a starting point, we used data labelled by Cordial and corrected by hand. We used CRF (Conditional Random Fields), to try to take the best possible advantage of the linguistic knowledge used to define the set of labels. We measure accuracy between 85 and 90, depending on the parameters.
Fichier principal
Vignette du fichier
tellierEtAl2010-cicling.pdf (159.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00467951 , version 1 (29-03-2010)

Identifiants

  • HAL Id : hal-00467951 , version 1

Citer

Isabelle Tellier, Iris Eshkol, Samer Taalab, Jean-Philippe Prost. POS-tagging for Oral Texts with CRF and Category Decomposition. Research in Computing Science, 2010, 46, pp.79--90. ⟨hal-00467951⟩
276 Consultations
115 Téléchargements

Partager

Gmail Facebook X LinkedIn More