UrQt: an efficient software for the Unsupervised Quality trimming of NGS data - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue BMC Bioinformatics Année : 2015

UrQt: an efficient software for the Unsupervised Quality trimming of NGS data

Résumé

Background: Quality control is a necessary step of any Next Generation Sequencing analysis. Although customary, this step still requires manual interventions to empirically choose tuning parameters according to various quality statistics. Moreover, current quality control procedures that provide a "good quality" data set, are not optimal and discard many informative nucleotides. To address these drawbacks, we present a new quality control method, implemented in UrQt software, for Unsupervised Quality trimming of Next Generation Sequencing reads. Results: Our trimming procedure relies on a well-defined probabilistic framework to detect the best segmentation between two segments of unreliable nucleotides, framing a segment of informative nucleotides. Our software only requires one user-friendly parameter to define the minimal quality threshold (phred score) to consider a nucleotide to be informative, which is independent of both the experiment and the quality of the data. This procedure is implemented in C++ in an efficient and parallelized software with a low memory footprint. We tested the performances of UrQt compared to the best-known trimming programs, on seven RNA and DNA sequencing experiments and demonstrated its optimality in the resulting tradeoff between the number of trimmed nucleotides and the quality objective. Conclusions: By finding the best segmentation to delimit a segment of good quality nucleotides, UrQt greatly increases the number of reads and of nucleotides that can be retained for a given quality objective. UrQt source files, binary executables for different operating systems and documentation are freely available (under the GPLv3) at the following address: https://lbbe.univ-lyon1.fr/-UrQt-.html.

Dates et versions

hal-02025607 , version 1 (19-02-2019)

Identifiants

Citer

Laurent Modolo, E. Lerat. UrQt: an efficient software for the Unsupervised Quality trimming of NGS data. BMC Bioinformatics, 2015, 16, pp.137. ⟨10.1186/s12859-015-0546-8⟩. ⟨hal-02025607⟩
16 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More