Ghost lineages deceive introgression tests and call for a new null hypothesis

Théo Tricou 1 Eric Tannier 2, 1 Damien de Vienne 1
2 BEAGLE - Artificial Evolution and Computational Biology
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information, Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive - UMR 5558
Abstract : Abstract The data that is known and sampled in any evolutionary study is always a small part of what exists, known or not, or what has existed in the past and is extinct. Therefore it is likely that all detected past horizontal gene fluxes, hybridization, introgressions, admixtures or transfers, involve “ghosts”, that is, extinct or unsampled lineages. The presence of these ghosts is acknowledged by all scientists, but almost all wish that and make as if their blurring influence would be low, like a background noise that, with a reasonable approximation, can be ignored. We assess this undervalued hypothesis by qualifying and quantifying the effect of ghost lineages on introgression detection by the popular D-statistics method. We use a genomic dataset of bears to illustrate and circumscribe the possibility of misinterpretation and show on simulated data that under certain conditions, far from unrealistic, most results interpreted from D-statistics, concerning the existence of introgression and the identity of donors and recipients of horizontal gene flows, are erroneous. In particular, the use of a distant outgroup, usually given as a solid ground for these tests, leads in fact to an increase in the error probability, and to false interpretations in a vast majority of the cases. We argue for a switch of the null hypothesis: the results of detection methods for gene fluxes should be interpreted with the full and visible participation of the unknown ghosts.
