Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits

In this paper, we present an adaptation to Modern Standard Arabic of a French and English term extractor. The goal of this work is to reduce the lack of resources and NLP tools for Arabic language in specialised domains. The adaptation firstly focuses on the description of extraction processes similar to those already defined for French and English while considering the morpho-syntactic specificity of Arabic. Agglutination phenomena are further taken into account in the term extraction process. The current state of the adapted system was evaluated on a medical text corpus. 400 maximal candidate terms were examined, among which 288 were correct (72% precision). An error analysis shows that term extraction errors are first due to Part-of-Speech tagging errors and the difficulties induced by non-diacritised texts, then to remaining agglutination phenomena.

Mots clés

Terminilogy Term Extraction Modern Standard Arabic

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01771875

Soumis le : vendredi 20 avril 2018-04:53:19

Dernière modification le : samedi 7 octobre 2023-21:36:20

Dates et versions

hal-01771875 , version 1 (20-04-2018)

Identifiants

HAL Id : hal-01771875 , version 1

Citer

Wafa Neifar, Thierry Hamon, Pierre Zweigenbaum, Mariem Ellouze Khemakhem, Lamia Hadrich Belguith. Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits. International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Jan 2016, Konya, Turkey. ⟨hal-01771875⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS13 CNRS LIMSI USPC UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE SORBONNE-PARIS-NORD LISN GS-ENGINEERING GS-COMPUTER-SCIENCE

66 Consultations

0 Téléchargements