Statistics and Data Quality Towards more collaboration between these communities
Résumé
dSummer 1980, during a conference given in the Institute of Statistics of Paris, a very impressive presentation on the FCA analysis that came along with multiple investigation tracks was turned out to be false as it was based on inaccurate data. Thirty years later, data quality is an autonomous discipline with dedicated academic mastering courses (Talburt et al. (2006)), publications (Redman (2001), Wand and Wang (1996)) and software (Gouasdoue et al. (2007)). In fact, a plethora of dimensions, metrics, models and database design techniques (Wang et al. (2001)) are now defined to handle data and their quality in the same ow, helping, then, the statisticians qualify and evaluate their results (Berti-Equille (2007)). In the other hand, statistical models were proposed to define the dimensions' metrics, detect outliers and anomalous data, analyze data heterogeneity, etc. (Batini and Scannapieco (2006)) Let's, then, build a bridge between the two communities and have a track data quality at CompStat 2011!