Deep learning versus conventional machine learning for detection of healthcare-associated infections in French clinical narratives - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Methods of Information in Medicine Année : 2019

Deep learning versus conventional machine learning for detection of healthcare-associated infections in French clinical narratives

Résumé

Objective: The objective of this article was to compare the performances of health care-associated infection (HAI) detection between deep learning and conventional machine learning (ML) methods in French medical reports. Methods: The corpus consisted in different types of medical reports (discharge summaries, surgery reports, consultation reports, etc.). A total of 1,531 medical text documents were extracted and deidentified in three French university hospitals. Each of them was labeled as presence (1) or absence (0) of HAI. We started by normalizing the records using a list of preprocessing techniques. We calculated an overall performance metric, the F1 Score, to compare a deep learning method (convolutional neural network [CNN]) with the most popular conventional ML models (Bernoulli and multi-naïve Bayes, k-nearest neighbors, logistic regression, random forests, extra-trees, gradient boosting, support vector machines). We applied the hyperparameter Bayesian optimization for each model based on its HAI identification performances. We included the set of text representation as an additional hyperparameter for each model, using four different text representations (bag of words, term frequency–inverse document frequency, word2vec, and Glove). Results: CNN outperforms all other conventional ML algorithms for HAI classification. The best F1 Score of 97.7% ± 3.6% and best area under the curve score of 99.8% ± 0.41% were achieved when CNN was directly applied to the processed clinical notes without a pretrained word2vec embedding. Through receiver operating characteristic curve analysis, we could achieve a good balance between false notifications (with a specificity equal to 0.937) and system detection capability (with a sensitivity equal to 0.962) using the Youden's index reference. Conclusions: The main drawback of CNNs is their opacity. To address this issue, we investigated CNN inner layers' activation values to visualize the most meaningful phrases in a document. This method could be used to build a phrase-based medical assistant algorithm to help the infection control practitioner to select relevant medical records. Our study demonstrated that deep learning approach outperforms other classification learning algorithms for automatically identifying HAIs in medical reports.
Fichier non déposé

Dates et versions

hal-02410077 , version 1 (13-12-2019)

Identifiants

Citer

Sara Rabhi, Jérémie Jakubowicz, Marie-Hélène Metzger. Deep learning versus conventional machine learning for detection of healthcare-associated infections in French clinical narratives. Methods of Information in Medicine, 2019, 58 (01), pp.031-041. ⟨10.1055/s-0039-1677692⟩. ⟨hal-02410077⟩
86 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More