Speeding up NGS software development

Abstract : The analysis of NGS data remains a time and space-consuming task. Many efforts have been made toprovide efficient data structures for indexing the terabytes of data generated by the fast sequencingmachines (Suffix Array, Burrows-Wheeler transform, Bloom Filter, etc.). Mapper tools, genomeassemblers, SNP callers, etc., make an intensive use of these data structures to keep their memoryfootprint as lower as possible.The overall efficiency of NGS software is brought by a smart combination of how data are representedinside the computer memory and how they are processed through the available processing units insidea processor. Developing such software is thus a real challenge, as it requires a large spectrum ofcompetences from high-level data structure and algorithm concepts to tiny details of implementation.We have developed a C++ library, called GATB (Genomic Assembly and Analysis Tool Box) tospeed up the design of NGS algorithms. This library offers a panel of high-level optimized buildingblocks. The underlying data structure is the de Bruijn graph, and the general parallelism model ismultithreading. The GATB library targets standard computing resources such as current multicoreprocessor (laptop computer, small server) with a few GB of memory. Hence, from high-level C++API, NGS programing designers can rapidly elaborate their own software based on state-of-the-artalgorithms and data structures of the domain.To demonstrate the efficiency of the GATB library, several NGS software have been designed such ascontiger (Minia), read corrector (Bloocoo) or SNP discovery (DiscoSNP). The GATB library iswritten in C++ and is available at the following web site http://gatb.inria.fr under the GNU AfferoGPL license.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01088683
Contributor : Dominique Lavenier <>
Submitted on : Friday, January 16, 2015 - 8:51:35 AM
Last modification on : Thursday, February 7, 2019 - 2:43:27 PM
Long-term archiving on : Thursday, September 10, 2015 - 3:55:58 PM

File

PosterSFAF.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01088683, version 1

Citation

Erwan Drezen, Guillaume Rizk, Rayan Chikhi, Charles Deltel, Claire Lemaitre, et al.. Speeding up NGS software development. Sequencing, Finishing and Analysis in the Future Meeting, May 2014, Santa Fé, United States. ⟨hal-01088683⟩

Share

Metrics

Record views

612

Files downloads

108