IRISA participation to BioNLP-ST 2013: lazy-learning and information retrieval for information extraction tasks
Résumé
This paper describes the information extraction techniques developed in the framework of the participation of IRISA-TexMex to the following BioNLP-ST13 tasks: Bacterial Biotope subtasks 1 and 2, and Graph Regulation Network. The approaches developed are general-purpose ones and do not rely on specialized preprocessing, nor specialized external data, and they are expected to work independently of the domain of the texts processed. They are classically based on machine learning techniques, but we put the emphasis on the use of similarity measures inherited from the information retrieval domain (Okapi-BM25 (Robertson et al., 1998), language modeling (Hiemstra, 1998)). Through the good results obtained for these tasks, we show that these simple settings are competitive provided that the representation and similarity chosen are well suited for the task.
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...