Skip to Main content Skip to Navigation
Conference papers

Construction of a de Bruijn Graph for Assembly from a Truncated Suffix Tree

Bastien Cazaux 1 Thierry Lecroq 2 Eric Rivals 1
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 TIBS - LITIS - Equipe Traitement de l'information en Biologie Santé
LITIS - Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes
Abstract : In the life sciences, determining the sequence of bio-molecules is essential step towards the understanding of their functions and interactions inside an organism. Powerful technologies allows to get huge quantities of short sequencing reads that need to be assemble to infer the complete target sequence. These constraints favour the use of a version de Bruijn Graph (DBG) dedicated to assembly. The de Bruijn Graph is usually built directly from the reads, which is time and space consuming. Given a set R of input words, well-known data structures, like the generalised suffix tree, can index all the substrings of words in R. In the context of DBG assembly, only substrings of length k + 1 and some of length k are useful. A truncated version of the suffix tree can index those efficiently. As indexes are exploited for numerous purposes in bioinformatics, as read cleaning, filtering, or even analysis, it is important to enable the community to reuse an existing index to build the DBG directly from it. In an earlier work we provided the first algorithms when starting from a suffix tree or suffix array. Here, we exhibit an algorithm that exploits a reduced version of the truncated suffix tree and computes the DBG from it. Importantly, a variation of this algorithm is also shown to compute the contracted DBG, which offers great benefits in practice. Both algorithms are linear in time and space in the size of the output.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-01955978
Contributor : Thierry Lecroq <>
Submitted on : Monday, January 14, 2019 - 6:15:54 PM
Last modification on : Tuesday, December 8, 2020 - 10:21:34 AM

Identifiers

Citation

Bastien Cazaux, Thierry Lecroq, Eric Rivals. Construction of a de Bruijn Graph for Assembly from a Truncated Suffix Tree. LATA: Language and Automata Theory and Applications, Mar 2015, Nice, France. pp.109-120, ⟨10.1007/978-3-319-15579-1_8⟩. ⟨hal-01955978⟩

Share

Metrics

Record views

367

Files downloads

57