Imputation of low frequency variants is using the HapMap3 benefits from large, diverse reference sets - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue European Journal of Human Genetics Année : 2011

Imputation of low frequency variants is using the HapMap3 benefits from large, diverse reference sets

Luke Jostins
  • Fonction : Auteur correspondant
  • PersonId : 909112

Connectez-vous pour contacter l'auteur
Katherine Morley
  • Fonction : Auteur
Jeffrey C Barrett
  • Fonction : Auteur

Résumé

Imputation allows the inference of unobserved genotypes in low-density datasets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low frequency variants). While much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 dataset that has been used to date for imputation, and the new HapMap3 dataset, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grants further improvements. Improvements are most pronounced for low frequency variants (frequency < 5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low frequency variants close to that of common ones. For low frequency variants, reference set diversity can improve the accuracy of imputation independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low frequency variants is required. Our results suggest that although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low frequency variants, the larger sample sizes of the main project will.
Fichier principal
Vignette du fichier
PEER_stage2_10.1038%2Fejhg.2011.10.pdf (153.11 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00618484 , version 1 (02-09-2011)

Identifiants

Citer

Luke Jostins, Katherine Morley, Jeffrey C Barrett. Imputation of low frequency variants is using the HapMap3 benefits from large, diverse reference sets. European Journal of Human Genetics, 2011, ⟨10.1038/ejhg.2011.10⟩. ⟨hal-00618484⟩

Collections

PEER
17 Consultations
75 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More