The contribution of the notion of hapax legomena to word alignment
Résumé
Current techniques in word alignment disregard words with a low frequency because they would not be useful. Against this belief, this paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent. In an experiment, we show that pairs of corpus hapaxes contribute the majority of the best word alignments. In addition, we show that the notion of sentence hapax justifies a practical and common simplification of a standard alignment method.
Origine : Fichiers produits par l'(les) auteur(s)