Increasing secondary diagnosis encoding quality using data mining techniques

Abstract : In order to measure the medical activity, hospitals are required to manually encode information concerning an inpatient episode using International Classification of Disease (ICD-10). This task is time consuming and requires substantial training for the staff. We propose to help by speeding up and facilitating the tedious task of coding patient information, specially while coding some secondary diagnoses that are not well described in the medical resources such as discharge letter and medical records. Our approach leverages data mining techniques in order to explore medical databases of previously encoded secondary diagnoses and use the stored structured information (age, gender, diagnoses count, medical procedures...) to build a decision tree that assigns the proper secondary diagnosis code into the corresponding inpatient episode or indicates the impatient episodes that contains implausible secondary diagnoses. The results suggest that better performance could be achieved by using low level of diagnoses granularity along with adding some filters to balance the repartition of the negative and positive examples in the training set. The obtained results show that there is big variation in the evaluation scores of the studied diagnoses, the highest score is 75% using F1 measurement and the lowest 25% using F1 measurement which indicates further enhancements are needed to achieve better performance regardless of the encoded diagnosis. However, the average accuracy of all the studied secondary diagnoses is around 80% which indicates better negative predictions therefore it could be useful in the prevention or the detection of wrong coding assignments of secondary diagnoses in the inpatient stay.
Document type :
Conference papers
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download
Contributor : Open Archive Toulouse Archive Ouverte (oatao) <>
Submitted on : Wednesday, June 6, 2018 - 4:28:57 PM
Last modification on : Thursday, October 17, 2019 - 8:55:56 AM
Long-term archiving on : Friday, September 7, 2018 - 2:11:03 PM


Files produced by the author(s)


  • HAL Id : hal-01809380, version 1
  • OATAO : 19022


Ghazar Chahbandarian, Nathalie Souf, Rémi Bastide, Jean-Christophe Steinbach. Increasing secondary diagnosis encoding quality using data mining techniques. 10th IEEE International Conference on Research Challenges in Information Science (RCIS 2016), Jun 2016, Grenoble, France. pp. 1-10. ⟨hal-01809380⟩



Record views


Files downloads