Skip to Main content Skip to Navigation
Poster communications

Speeding up NGS software development

Abstract : The analysis of NGS data remains a time and space-consuming task. Many efforts have been made toprovide efficient data structures for indexing the terabytes of data generated by the fast sequencingmachines (Suffix Array, Burrows-Wheeler transform, Bloom Filter, etc.). Mapper tools, genomeassemblers, SNP callers, etc., make an intensive use of these data structures to keep their memoryfootprint as lower as possible.The overall efficiency of NGS software is brought by a smart combination of how data are representedinside the computer memory and how they are processed through the available processing units insidea processor. Developing such software is thus a real challenge, as it requires a large spectrum ofcompetences from high-level data structure and algorithm concepts to tiny details of implementation.We have developed a C++ library, called GATB (Genomic Assembly and Analysis Tool Box) tospeed up the design of NGS algorithms. This library offers a panel of high-level optimized buildingblocks. The underlying data structure is the de Bruijn graph, and the general parallelism model ismultithreading. The GATB library targets standard computing resources such as current multicoreprocessor (laptop computer, small server) with a few GB of memory. Hence, from high-level C++API, NGS programing designers can rapidly elaborate their own software based on state-of-the-artalgorithms and data structures of the domain.To demonstrate the efficiency of the GATB library, several NGS software have been designed such ascontiger (Minia), read corrector (Bloocoo) or SNP discovery (DiscoSNP). The GATB library iswritten in C++ and is available at the following web site under the GNU AfferoGPL license.
Complete list of metadata
Contributor : Dominique Lavenier <>
Submitted on : Friday, January 16, 2015 - 8:51:35 AM
Last modification on : Thursday, January 7, 2021 - 4:14:24 PM
Long-term archiving on: : Thursday, September 10, 2015 - 3:55:58 PM


Files produced by the author(s)


  • HAL Id : hal-01088683, version 1


Erwan Drezen, Guillaume Rizk, Rayan Chikhi, Charles Deltel, Claire Lemaitre, et al.. Speeding up NGS software development. Sequencing, Finishing and Analysis in the Future Meeting, May 2014, Santa Fé, United States. ⟨hal-01088683⟩



Record views


Files downloads