Automatic Compilation of Comparable corpora - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Automatic Compilation of Comparable corpora

Résumé

The exploitation of comparable corpora has proven to be a valuable alternative to rare parallel corpora in various Natural Language Processing tasks. Therefore many researchers have stressed the need for large quantities of such corpora and the scarcity of works on their compilation. Our purpose in this paper is to address this issue by using the CLIR-based method for the automatic acquisition of French-English comparable documents. At the start of the process, source documents are translated and most representative terms are extracted. The resulting keyword list is further enlarged with synonyms on the assumption that keyword expansion might improve the retrieval of such documents. Retrieval is performed on the indexed target collection and a further filtering step based mainly on temporal information and document length takes place. Results are fair and suggest that the use of ontology may improve the performance of the system.
Fichier principal
Vignette du fichier
Article_ManuelaYapomo_draftversion.pdf (205.93 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01073850 , version 1 (10-10-2014)

Identifiants

  • HAL Id : hal-01073850 , version 1

Citer

Manuela Yapomo. Automatic Compilation of Comparable corpora. Natural Language Processing and Human Language Technology, Jun 2011, Faro, Portugal. ⟨hal-01073850⟩
81 Consultations
66 Téléchargements

Partager

Gmail Facebook X LinkedIn More