The Deluge of Spurious Correlations in Big Data

Abstract : Very large databases are a ma jor opp ortunity for science and data analytics is a remarkable new field of investigation in computer science. The effectiveness of these toolsis used to support a “philosophy” against the scientific method as developed throughout history. According to this view, computer-discovered correlations should replace understanding and guide prediction and action. Consequently, there will be no need to give scientific meaning to phenomena, by proposing, say, causal relations, since regularities in very large databases are enough: “with enough data, the numbers speak for themselves”. The “end of science” is proclaimed. Using classical results from ergodic theory, Ramsey theory and algorithmic information theory, we show that this “philosophy” is wrong. For example, we prove that very large databases have to contain arbitrary correlations. These correlations appear only due to the size, not the nature, of data. They can be found in “randomly” generated, large enough databases, which - as we will prove - implies that most correlations are spurious. Too much information tends to behave like very little information. The scientific method can be enriched by computer mining in immense databases, but not replaced by it.
Document type :
Conference papers
Complete list of metadatas

Cited literature [47 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01380626
Contributor : Giuseppe Longo <>
Submitted on : Monday, October 17, 2016 - 12:01:59 PM
Last modification on : Thursday, December 6, 2018 - 12:22:37 PM
Long-term archiving on : Saturday, February 4, 2017 - 9:13:02 PM

File

BigData-Calude-LongoAug21.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

Identifiers

Citation

Cristian Calude, Giuseppe Longo. The Deluge of Spurious Correlations in Big Data. Lois des dieux, des hommes et de la nature, Oct 2015, Nantes, France. pp.1 - 18, ⟨10.1007/s10699-016-9489-4⟩. ⟨hal-01380626⟩

Share

Metrics

Record views

1533

Files downloads

1055