The contribution of the notion of hapax legomena to word alignment - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Proceedings of the 4th Language and Technology Conference (LTC'07) Année : 2007

The contribution of the notion of hapax legomena to word alignment

Résumé

Current techniques in word alignment disregard words with a low frequency because they would not be useful. Against this belief, this paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent. In an experiment, we show that pairs of corpus hapaxes contribute the majority of the best word alignments. In addition, we show that the notion of sentence hapax justifies a practical and common simplification of a standard alignment method.
Fichier principal
Vignette du fichier
ltc07-lardilleux.pdf (120.4 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00252026 , version 1 (12-02-2008)
hal-00252026 , version 2 (17-03-2009)

Identifiants

  • HAL Id : hal-00252026 , version 2

Citer

Adrien Lardilleux, Yves Lepage. The contribution of the notion of hapax legomena to word alignment. The 3rd Language and Technology Conference (LTC'07), Oct 2007, Poznań, Poland. pp.458-462. ⟨hal-00252026v2⟩
160 Consultations
1233 Téléchargements

Partager

Gmail Facebook X LinkedIn More