Binning unassembled short reads based on $k$-mer abundance covariance using sparse coding - Direction de la recherche fondamentale - sciences du vivant Accéder directement au contenu
Article Dans Une Revue GigaScience Année : 2020

Binning unassembled short reads based on $k$-mer abundance covariance using sparse coding

Résumé

Background: Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. Results: We present here a scalable pre-assembly binning scheme (i.e., operating on unassembled short reads) enabling latent genome recovery by leveraging sparse dictionary learning and elastic-net regularization, and its use to recover hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines DEEP population cohort (n = 1,135, > 10$^{10}$ reads). Conclusion: We showed that sparse coding techniques can be leveraged to carry out read-level binning at large scale and that, despite lower genome reconstruction yields compared to assembly-based approaches, bin-first strategies can complement the more widely used assembly-first protocols by targeting distinct genome segregation profiles. Read enrichment levels across 6 orders of magnitude in relative abundance were observed, indicating that the method has the power to recover genomes consistently segregating at low levels.
Fichier principal
Vignette du fichier
giaa028.pdf (2.06 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
licence : CC BY - Paternité

Dates et versions

cea-04252039 , version 1 (20-10-2023)

Identifiants

Citer

Olexiy Kyrgyzov, Vincent Prost, Stéphane Gazut, Bruno Farcy, Thomas Brüls. Binning unassembled short reads based on $k$-mer abundance covariance using sparse coding. GigaScience, 2020, 9 (4), pp.1-13. ⟨10.1093/gigascience/giaa028⟩. ⟨cea-04252039⟩
31 Consultations
11 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More