15863 articles – 31995 references  [version française]
HAL: hal-00661214, version 1

Detailed view  Export this paper
On the Genetic interpretation of Between-Group PCA on SNP data
Denis Laloë ( ) 1, Mathieu Gautier 2
(2011-03-18)

Principal Components Analysis (PCA) is a standard method to explore large SNP data sets. We propose in this study additional genetic-oriented interpretations of PCA results about the characterization of population genetic structure when dealing with SNP data. We show that a normed PCA on biallelic SNP haplotypes is equivalent to a Multiple Correspondence Analysis on haplotypes and to a PCA on the r correlation matrix, where r represents the square root of the r2 linkage disequilibrium measure. Each resulting principal component describes a typology and provides a measure of the underlying SNP contributions which may further be interpreted in terms of FST. In addition, PCA can be partitioned into sub-analyses (between-group, within-group). Between-group PCA maximizes the variance between groups and delivers principal components with maximum FST. Only per-group allele frequencies and relative frequencies are needed to compute between-group PCA. Finally, chromosomal regions containing SNPs with high contributions may be interpreted as footprints of selection. As an illustration of the approach we analyzed human chromosome 2 haplotypes sampled from three HapMap populations (from African, Asian and European origin). We showed that SNPs within or close to EDAR and LCT genes exhibit the highest typological values, in agreement with previous studies.
1:  Génétique Animale et Biologie Intégrative (GABI)
Institut national de la recherche agronomique (INRA) : UMR1313 – AgroParisTech
2:  Centre de biologie et gestion des populations (CBGP)
Centre de coopération internationale en recherche agronomique pour le développement [CIRAD] : UMR55 – Institut national de la recherche agronomique (INRA) – Université Montpellier II - Sciences et techniques – Institut de recherche pour le développement [IRD] : UR022
Life Sciences/Genetics/Animal genetics
SNP – Principal Component Analysis – Fst – Selection footprint
Attached file list to this document: 
PDF
Article.pdf(1009.9 KB)