Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case-study

Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system, for instance in the form of lexicons and pattern-based rules, are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. We examine different methods to combine two such systems and test the most relevant ones through experiments performed on the i2b2/VA 2012 challenge data. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the two systems from obtaining improvements in precision, recall, or F-measure, and analyse the underlying mechanisms through a post-hoc feature-level analysis. We also observe that wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710 (strict matching of types and boundaries, as per the conlleval program), bringing it on par with the data-driven system. The generality of this method remains to be further investigated.

Mots clés

Natural Language Processing Information Extraction Medical records Machine Learning Hybrid Meth- ods Overfitting

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01972779

Soumis le : lundi 7 janvier 2019-21:29:47

Dernière modification le : mercredi 28 février 2024-14:37:15

Dates et versions

hal-01972779 , version 1 (07-01-2019)

Identifiants

HAL Id : hal-01972779 , version 1

Citer

Pierre Zweigenbaum, Thomas Lavergne, Natalia Grabar, Thierry Hamon, Sophie Rosset, et al.. Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case-study. Biomedical Informatics Insights, 2013, 13p. ⟨hal-01972779⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS13 CNRS LIMSI STL UNIV-PARIS-SACLAY UNIV-LILLE SORBONNE-UNIVERSITE SORBONNE-PARIS-NORD LISN GS-SPORT-HUMAN-MOVEMENT ACT-R

163 Consultations

0 Téléchargements