Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Genetics Selection Evolution Année : 2016

Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle

Chen Yao
  • Fonction : Auteur correspondant
  • PersonId : 1002795

Connectez-vous pour contacter l'auteur
Xiaojin Zhu
  • Fonction : Auteur
  • PersonId : 1002796
Kent A. Weigel
  • Fonction : Auteur
  • PersonId : 1002797

Résumé

AbstractBackgroundGenomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self-training model, and applied this method to genomic prediction of residual feed intake (RFI) in dairy cattle.MethodsWe describe a self-training model that is wrapped around a support vector machine (SVM) algorithm, which enables it to use data from animals with and without measured phenotypes. Initially, a SVM model was trained using data from 792 animals with measured RFI phenotypes. Then, the resulting SVM was used to generate self-trained phenotypes for 3000 animals for which RFI measurements were not available. Finally, the SVM model was re-trained using data from up to 3792 animals, including those with measured and self-trained RFI phenotypes.ResultsIncorporation of additional animals with self-trained phenotypes enhanced the accuracy of genomic predictions compared to that of predictions that were derived from the subset of animals with measured phenotypes. The optimal ratio of animals with self-trained phenotypes to animals with measured phenotypes (2.5, 2.0, and 1.8) and the maximum increase achieved in prediction accuracy measured as the correlation between predicted and actual RFI phenotypes (5.9, 4.1, and 2.4%) decreased as the size of the initial training set (300, 400, and 500 animals with measured phenotypes) increased. The optimal number of animals with self-trained phenotypes may be smaller when prediction accuracy is measured as the mean squared error rather than the correlation between predicted and actual RFI phenotypes. ConclusionsOur results demonstrate that semi-supervised learning models that incorporate self-trained phenotypes can achieve genomic prediction accuracies that are comparable to those obtained with models using larger training sets that include only animals with measured phenotypes. Semi-supervised learning can be helpful for genomic prediction of novel traits, such as RFI, for which the size of reference population is limited, in particular, when the animals to be predicted and the animals in the reference population originate from the same herd-environment.
Fichier principal
Vignette du fichier
12711_2016_Article_262.pdf (1.82 Mo) Télécharger le fichier
Origine : Publication financée par une institution
Loading...

Dates et versions

hal-01479213 , version 1 (28-02-2017)

Identifiants

Citer

Chen Yao, Xiaojin Zhu, Kent A. Weigel. Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle. Genetics Selection Evolution, 2016, 48 (1), pp.84. ⟨10.1186/s12711-016-0262-5⟩. ⟨hal-01479213⟩
18 Consultations
43 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More