Hybrid de novo tandem repeat detection using short and long reads

Abstract : Background As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. Methods In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies. Results MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns. Conclusions Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.
Type de document :
Article dans une revue
BMC Medical Genomics, BioMed Central, 2015, 8 (Suppl 3), pp.S5. 〈10.1186/1755-8794-8-S3-S5〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01214038
Contributeur : Guillaume Fertin <>
Soumis le : vendredi 9 octobre 2015 - 16:07:23
Dernière modification le : jeudi 5 avril 2018 - 10:36:49

Lien texte intégral

Identifiants

Collections

Citation

Guillaume Fertin, Géraldine Jean, Andreea Radulescu, Irena Rusu. Hybrid de novo tandem repeat detection using short and long reads. BMC Medical Genomics, BioMed Central, 2015, 8 (Suppl 3), pp.S5. 〈10.1186/1755-8794-8-S3-S5〉. 〈hal-01214038〉

Partager

Métriques

Consultations de la notice

70