Abstract : Genome-wide association studies (GWAS) aim at detecting correlation between a phenotypic trait and a set of hundreds of thousands biological markers, called single nucleotide polymorphism or SNPs. As usual in genomics, data suffer from heterogeneity that generate dependency between SNPs. To account for spatial dependency due to linkage disequilibrium, the first step in GWAS, called tagging, consists in selecting the most informative markers to analyse a smaller set of markers. However usual tagging techniques are not designed to account for interaction. In this work we propose a novel method to select a set of tag-SNPs that optimally represent the total set of all pairs of SNPs. The correlation between two pairs of SNPs is measured by the normalized mutual information. To demonstrate its feasibility, we apply our method to a set of simulated datasets obtained from a reference panel of individuals. Furthermore, the comparison of our method with existing tagging strategies proved that, on the one hand our method is powerful and that, on the other hand, it significantly decreases the proportion of false discovery.