Deep learning versus conventional machine learning for detection of healthcare-associated infections in French clinical narratives

Objective: The objective of this article was to compare the performances of health care-associated infection (HAI) detection between deep learning and conventional machine learning (ML) methods in French medical reports. Methods: The corpus consisted in different types of medical reports (discharge summaries, surgery reports, consultation reports, etc.). A total of 1,531 medical text documents were extracted and deidentified in three French university hospitals. Each of them was labeled as presence (1) or absence (0) of HAI. We started by normalizing the records using a list of preprocessing techniques. We calculated an overall performance metric, the F1 Score, to compare a deep learning method (convolutional neural network [CNN]) with the most popular conventional ML models (Bernoulli and multi-naïve Bayes, k-nearest neighbors, logistic regression, random forests, extra-trees, gradient boosting, support vector machines). We applied the hyperparameter Bayesian optimization for each model based on its HAI identification performances. We included the set of text representation as an additional hyperparameter for each model, using four different text representations (bag of words, term frequency–inverse document frequency, word2vec, and Glove). Results: CNN outperforms all other conventional ML algorithms for HAI classification. The best F1 Score of 97.7% ± 3.6% and best area under the curve score of 99.8% ± 0.41% were achieved when CNN was directly applied to the processed clinical notes without a pretrained word2vec embedding. Through receiver operating characteristic curve analysis, we could achieve a good balance between false notifications (with a specificity equal to 0.937) and system detection capability (with a sensitivity equal to 0.962) using the Youden's index reference. Conclusions: The main drawback of CNNs is their opacity. To address this issue, we investigated CNN inner layers' activation values to visualize the most meaningful phrases in a document. This method could be used to build a phrase-based medical assistant algorithm to help the infection control practitioner to select relevant medical records. Our study demonstrated that deep learning approach outperforms other classification learning algorithms for automatically identifying HAIs in medical reports.

Mots clés

Healthcare-associated infections Epidemiology Deep learning Machine learning Natural language processing Electronic health records

Domaines

Informatique et langage [cs.CL]

Marie-Helene METZGER : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02410077

Soumis le : vendredi 13 décembre 2019-16:35:50

Dernière modification le : jeudi 25 avril 2024-11:17:50

Dates et versions

hal-02410077 , version 1 (13-12-2019)

Identifiants

HAL Id : hal-02410077 , version 1
DOI : 10.1055/s-0039-1677692

Citer

Sara Rabhi, Jérémie Jakubowicz, Marie-Hélène Metzger. Deep learning versus conventional machine learning for detection of healthcare-associated infections in French clinical narratives. Methods of Information in Medicine, 2019, 58 (01), pp.031-041. ⟨10.1055/s-0039-1677692⟩. ⟨hal-02410077⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-PARIS13 CNRS TELECOM-SUDPARIS CESP UVSQ USPC UNIV-PARIS-SACLAY SORBONNE-PARIS-NORD GS-SANTE-PUBLIQUE ACT-R

86 Consultations

0 Téléchargements