| HAL : hal-00753930, version 1 |
| DOI : 10.1007/978-3-642-33122-0_19 |
| Fiche détaillée | Récupérer au format |
|
|
| WABI 2012, Ljubljana : Slovenia (2012) |
|
|
|
|
| Space-efficient and exact de Bruijn graph representation based on a Bloom filter |
|
|
Rayan Chikhi 1, 2Guillaume Rizk 3 |
|
|
| (01/09/2012) |
|
|
| The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (> 30 GB). We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 Gb of memory in 23 hours. |
|
|
|
|
|
|
|
|
|
|
| 1 : | GENSCALE (INRIA - IRISA) |
| INRIA – CNRS : UMR6074 – Université de Rennes 1 – École normale supérieure de Cachan - ENS Cachan | |
| 2 : | École normale supérieure de Cachan, antenne de Bretagne (ENS Cachan Bretagne) |
| École normale supérieure de Cachan - ENS Cachan | |
| 3 : | Algorizk [Paris] |
| Algorizk | |
|
|
|
|
|
|
|
|
| Domaine | : | Informatique/Bio-informatique Sciences du Vivant/Bio-Informatique, Biologie Systémique |
|
|
| Liste des fichiers attachés à ce document : | |||||
|
|
|
| hal-00753930, version 1 | |
| http://hal.archives-ouvertes.fr/hal-00753930 | |
| oai:hal.archives-ouvertes.fr:hal-00753930 | |
| Contributeur : Rayan Chikhi | |
| Soumis le : Lundi 19 Novembre 2012, 22:53:47 | |
| Dernière modification le : Jeudi 22 Novembre 2012, 14:52:29 | |