S. R. Sukumar, R. Natarajan, and R. K. Ferrell, Quality of Big Data in health care, Int. J. Health Care Qual. Assur, vol.28, pp.621-634, 2015.

A. W. Toga and I. D. Dinov, Sharing big biomedical data, J. Big Data, vol.2, p.7, 2015.

S. N. Murphy, M. Mendis, K. Hackett, R. Kuttan, W. Pan et al., Architecture of the opensource clinical research chart from Informatics for Integrating Biology and the Bedside., Annu. Symp. Proceedings. AMIA Symp, pp.548-52, 2007.

, i2b2: Informatics for Integrating Biology & the Bedside

G. Hripcsak, J. D. Duke, N. H. Shah, C. G. Reich, V. Huser et al., Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers, Stud. Health Technol. Inform, vol.216, pp.574-582, 2015.

. Ohdsi,

B. J. Bock, C. T. Dolan, G. C. Miller, W. F. Fitter, B. D. Hartsell et al., The Data Warehouse as a Foundation for PopulationBased Reference Intervals, Am. J. Clin. Pathol, vol.120, pp.662-670, 2003.

A. K. Manrai, C. J. Patel, and J. P. Ioannidis, In the Era of Precision Medicine and Big Data, Who Is Normal?, JAMA, 2018.

N. Rappoport, H. Paik, B. Oskotsky, R. Tor, E. Ziv et al., Creating ethnicity-specific reference intervals for lab tests from EHR data, 2017.

P. F. Brennan and W. W. Stead, Assessing Data Quality: From Concordance, through Correctness and Completeness, to Valid Manipulatable Representations, J. Am. Med. Informatics Assoc, vol.7, pp.106-107, 2000.

N. G. Weiskopf and C. Weng, Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research, J. Am. Med. Informatics Assoc, vol.20, pp.144-151, 2013.

M. G. Kahn, T. J. Callahan, J. Barnard, A. E. Bauck, J. Brown et al., A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data, EGEMs (Generating Evid, Methods to Improv. Patient Outcomes), vol.4, p.18, 2016.

C. Sáez, J. Martínez-miranda, M. Robles, and J. M. García-gómez, Organizing data quality assessment of shifting biomedical data, Stud. Health Technol. Inform, vol.180, pp.721-726, 2012.

R. Khare, L. Utidjian, B. J. Ruth, M. G. Kahn, E. Burrows et al.,

. Bailey, A longitudinal analysis of data quality in a large pediatric data research network, J. Am. Med. Informatics Assoc, vol.24, pp.1072-1079, 2017.

K. Lee, N. Weiskopf, and J. Pathak, A Framework for Data Quality Assessment in Clinical Research Datasets, Annu. Symp. Proceedings. AMIA Symp. 2017, pp.1080-1089, 2017.

. Iso/ts, Master data: Exchange of characteristic data: Syntax, semantic encoding, and conformance to data specification

R. G. Hauser, D. B. Quine, and A. Ryder, LabRS: A Rosetta stone for retrospective standardization of clinical laboratory test results, J. Am. Med. Informatics Assoc, vol.25, pp.121-126, 2018.

C. Sáez, O. Zurriaga, J. Pérez-panadés, I. Melchor, M. Robles et al., Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories, J. Am. Med. Informatics Assoc, vol.23, pp.1085-1095, 2016.

T. Dasu, G. T. Vesonder, and J. R. Wright, Data quality through knowledge engineering, Proc. Ninth ACM SIGKDD Int. Conf. Knowl. Discov. Data Min.KDD '03, p.705, 2003.

T. Dasu, T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi, An informationtheoretic approach to detecting changes in multi-dimensional data streams, PROC. SYMP. INTERFACE Stat. Comput. Sci. Appl, 2006.

T. Dasu, S. Krishnan, D. Lin, S. Venkatasubramanian, and K. Yi, Change (Detection) You Can Believe in: Finding Distributional Shifts in Data Streams, pp.21-34, 2009.

L. Berti-equille, T. Dasu, and D. Srivastava, Discovery of complex glitch patterns: A novel approach to Quantitative Data Cleaning, IEEE 27th Int. Conf. Data Eng, pp.733-744, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01855785

M. M. Breunig, H. Kriegel, R. T. Ng, J. Sander, and L. , Proc. 2000 ACM SIGMOD Int. Conf. Manag. Data-SIGMOD '00, pp.93-104, 2000.

E. M. Knorr, R. T. Ng, and V. Tucakov, Distance-based outliers: algorithms and applications, VLDB J. Int. J. Very Large Data Bases, vol.8, pp.237-253, 2000.

E. M. Knorr and R. T. Ng, Notion of Outliers: Properties and Computation, in: KDD Proc, 1997.

M. Yakout, A. K. Elmagarmid, J. Neville, M. Ouzzani, and I. F. Ilyas, Guided data repair, Proc. VLDB Endow, vol.4, pp.279-289, 2011.

M. Stonebraker-mit, D. Bruckner, I. F. Ilyas-qcri, G. B. Qcri, M. Cherniack et al., Data Curation at Scale: The Data Tamer System, Bienn. Conf. Innov. Data Syst. Res, 2013.

I. F. Xu-chu, P. Ilyas, and . Papotti, Holistic data cleaning: Putting violations into context, IEEE 29th Int. Conf. Data Eng, pp.458-469, 2013.

J. S. Brown, M. Kahn, and D. Toh, Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks, Med. Care, vol.51, pp.22-29, 2013.

M. D. Wilkinson, M. Dumontier, .. J. Ij, G. Aalbersberg, M. Appleton et al.,

S. Schaik, E. Sansone, T. Schultes, T. Sengstag, G. Slater et al., The FAIR Guiding Principles for scientific data management and stewardship, Sci. Data, vol.3, p.160018, 2016.

T. Dasu and T. Johnson, Exploratory Data Mining and Data Cleaning, 2003.

P. Degoulet, The HEGP component-based clinical information system, Int. J. Med. Inform, vol.69, pp.101-107, 2003.

E. Zapletal, N. Rodon, N. Grabar, and P. Degoulet, Methodology of integration of a clinical data warehouse with a clinical information system: the HEGP case, Stud. Health Technol. Inform, vol.160, pp.193-200, 2010.

A. Jannot, E. Zapletal, P. Avillach, M. Mamzer, A. Burgun et al., The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience, Int. J. Med. Inform, vol.102, pp.21-28, 2017.

R. Koenker, quantreg: Quantile Regression, 2017.

R. Killick, P. Fearnhead, and I. A. Eckley, Optimal Detection of Changepoints With a Linear Computational Cost, J. Am. Stat. Assoc, vol.107, pp.1590-1598, 2012.

R. Killick and I. A. Eckley, changepoint : An R Package for Changepoint Analysis, J. Stat. Softw, vol.58, 2014.

N. P. Tatonetti, Translational medicine in the Age of Big Data, Brief. Bioinform, 2017.

N. G. Weiskopf, G. Hripcsak, S. Swaminathan, and C. Weng, Defining and measuring completeness of electronic health records for secondary use, J. Biomed. Inform, vol.46, pp.830-836, 2013.

H. Estiri, K. A. Stephens, J. G. Klann, and S. N. Murphy, Exploring completeness in clinical data research networks with DQe-c, J. Am. Med. Informatics Assoc, vol.25, pp.17-24, 2018.

C. C. Aggarwal and P. S. Yu, Outlier detection for high dimensional data, Proc. 2001 ACM SIGMOD Int. Conf. Manag. Data-SIGMOD '01, pp.37-46, 2001.

N. G. Weiskopf, S. Bakken, G. Hripcsak, and C. Weng, A Data Quality Assessment Guideline for Electronic Health Record Data Reuse, EGEMs (Generating Evid, Methods to Improv. Patient Outcomes), vol.5, p.14, 2017.