Minimal perfect hash functions in large scale bioinformatics Problem

. Genomic and metagenomic fields, generating huge sets of short genomic sequences, brought their own share of high performance problems. To extract relevant pieces of information from the huge data sets generated by current sequencing techniques, one must rely on extremely scalable methods and solutions. Indexing billions of objects is a task considered too expensive while being a fundamental need in this field. In this paper we propose a straightforward indexing structure that scales to billions of element and we propose two direct applications in genomics and metagenomics. We show that our proposal solves problem instances for which no other known solution scales-up. We believe that many tools and applications could benefit from either the fundamental data structure we provide or from the applications developed from this structure.

Domaines

Bio-informatique [q-bio.QM] Algorithme et structure de données [cs.DS]

Fichier principal

PosterJOBIM2016.pdf (13.51 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Limasset : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01341718

Soumis le : lundi 4 juillet 2016-16:24:27

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : mercredi 5 octobre 2016-14:48:37

Dates et versions

hal-01341718 , version 1 (04-07-2016)

Identifiants

HAL Id : hal-01341718 , version 1

Citer

Antoine Limasset, Camille Marchet, Pierre Peterlongo, Lucie Bittner. Minimal perfect hash functions in large scale bioinformatics Problem. JOBIM 2016, Jun 2016, Lyon, France. ⟨hal-01341718⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM ENS-PARIS UPMC UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA ADMM CENTRALESUPELEC IRISA-D7 INRIA2 PSL UR1-MATH-STIC UR1-UFR-ISTIC IBPS UNIV-RENNES SORBONNE-UNIVERSITE SU-SCIENCES UR1-MATH-NUM SBR

405 Consultations

157 Téléchargements