The contribution of the notion of hapax legomena to word alignment - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue LTC'07 Année : 2007

The contribution of the notion of hapax legomena to word alignment

Résumé

Current techniques in word alignment disregard words with a low frequency because they would not be useful. Against this belief, this paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent. In an experiment, we show that pairs of corpus hapaxes contribute the majority of the best word alignments. In addition, we show that the notion of sentence hapax justifies a practical and common simplification of a standard alignment method.

Domaines

Autre [cs.OH]
Fichier principal
Vignette du fichier
ltc-039-Lardilleux.pdf (122.77 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-00252026 , version 1 (12-02-2008)
hal-00252026 , version 2 (17-03-2009)

Identifiants

  • HAL Id : hal-00252026 , version 1

Citer

Adrien Lardilleux, Yves Lepage. The contribution of the notion of hapax legomena to word alignment. LTC'07, 2007, pp.0. ⟨hal-00252026v1⟩
161 Consultations
1238 Téléchargements

Partager

Gmail Facebook X LinkedIn More