What can millions of laboratory test results tell us about the temporal aspect of data quality? Study of data spanning 17 years in a clinical data warehouse

Abstract : Objective: To identify common temporal evolution profiles in biological data and propose a semi-automated method to these patterns in a clinical data warehouse (CDW). Materials and Methods: We leveraged the CDW of the European Hospital Georges Pompidou and tracked the evolution of 192 biological parameters over a period of 17 years (for 445,000+ patients, and 131 million laboratory test results). Results: We identified three common profiles of evolution: discretization, breakpoints, and trends. We developed computational and statistical methods to identify these profiles in the CDW. Overall, of the 192 observed biological parameters (87,814,136 values), 135 presented at least one evolution. We identified breakpoints in 30 distinct parameters, discretizations in 32, and trends in 79. Discussion and conclusion: our method allowed the identification of several temporal events in the data. Considering the distribution over time of these events, we identified probable causes for the observed profiles: instruments or software upgrades and changes in computation formulas. We evaluated the potential impact for data reuse. Finally, we formulated recommendations to enable safe use and sharing of biological data collection to limit the impact of data evolution in retrospective and federated studies (e.g. the annotation of laboratory parameters presenting breakpoints or trends).
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01978796
Contributor : Vincent Looten <>
Submitted on : Friday, January 11, 2019 - 5:53:57 PM
Last modification on : Thursday, June 13, 2019 - 11:14:03 AM

File

BioQuality_preprint.pdf
Files produced by the author(s)

Identifiers

Données associées

Citation

Vincent Looten, Liliane Kong Win Chang, Antoine Neuraz, Marie-Anne Landau-Loriot, Benoit Vedie, et al.. What can millions of laboratory test results tell us about the temporal aspect of data quality? Study of data spanning 17 years in a clinical data warehouse. Computer Methods and Programs in Biomedicine, Elsevier, 2018, ⟨10.1016/j.cmpb.2018.12.030⟩. ⟨hal-01978796⟩

Share

Metrics

Record views

88

Files downloads

69