Combining Sequence and Itemset Mining to Discover Named Entities in Biomedical Texts: A New Type of Pattern - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue International Journal of Data Mining, Modelling and Management Année : 2009

Combining Sequence and Itemset Mining to Discover Named Entities in Biomedical Texts: A New Type of Pattern

Résumé

Biomedical named entity recognition (NER) is a challenging problem. In this paper, we show that mining techniques, such as sequential pattern mining and sequential rule mining, can be useful to tackle this problem but present some limitations. We demonstrate and analyse these limitations and introduce a new kind of pattern called LSR pattern that offers an excellent trade-off between the high precision of sequential rules and the high recall of sequential patterns. We formalise the LSR pattern mining problem first. Then we show how LSR patterns enable us to successfully tackle biomedical NER problems. We report experiments carried out on real datasets that underline the relevance of our proposition.
Fichier principal
Vignette du fichier
RIACL-PLANTEVIT-2009-3.pdf (471.16 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-01011378 , version 1 (23-06-2014)

Identifiants

Citer

Marc Plantevit, Thierry Charnois, Jiri Kléma, Christophe Rigotti, Bruno Crémilleux. Combining Sequence and Itemset Mining to Discover Named Entities in Biomedical Texts: A New Type of Pattern. International Journal of Data Mining, Modelling and Management, 2009, 1 (2), pp.119-148. ⟨10.1504/IJDMMM.2009.026073⟩. ⟨hal-01011378⟩
287 Consultations
251 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More