| Principal Components Analysis (PCA) is a standard method to explore large SNP data sets. We propose in this study additional genetic-oriented interpretations of PCA results about the characterization of population genetic structure when dealing with SNP data. We show that a normed PCA on biallelic SNP haplotypes is equivalent to a Multiple Correspondence Analysis on haplotypes and to a PCA on the r correlation matrix, where r represents the square root of the r2 linkage disequilibrium measure. Each resulting principal component describes a typology and provides a measure of the underlying SNP contributions which may further be interpreted in terms of FST. In addition, PCA can be partitioned into sub-analyses (between-group, within-group). Between-group PCA maximizes the variance between groups and delivers principal components with maximum FST. Only per-group allele frequencies and relative frequencies are needed to compute between-group PCA. Finally, chromosomal regions containing SNPs with high contributions may be interpreted as footprints of selection. As an illustration of the approach we analyzed human chromosome 2 haplotypes sampled from three HapMap populations (from African, Asian and European origin). We showed that SNPs within or close to EDAR and LCT genes exhibit the highest typological values, in agreement with previous studies. |