. Abouelhoda, Replacing suffix trees with enhanced suffix arrays, Journal of Discrete Algorithms, vol.2, issue.1, pp.53-86, 2004.
DOI : 10.1016/S1570-8667(03)00065-0

H. Agrawal, A. Agrawal, and X. Huang, DNAlignTT: Pairwise DNA alignment with sequence specific transition-transversion ratio, 2008 IEEE International Conference on Electro/Information Technology, pp.457-459, 2008.
DOI : 10.1109/EIT.2008.4554345

. Ahmed, Plasmodium falciparum Isolates in India Exhibit a Progressive Increase in Mutations Associated with Sulfadoxine-Pyrimethamine Resistance, Antimicrobial Agents and Chemotherapy, vol.48, issue.3, pp.879-889, 2004.
DOI : 10.1128/AAC.48.3.879-889.2004

G. Altschul, S. Altschul, and W. Gish, [27] Local alignment statistics, Methods Enzymol, vol.266, pp.460-480, 1996.
DOI : 10.1016/S0076-6879(96)66029-7

. Altschul, Basic local alignment search tool, Journal of Molecular Biology, vol.215, issue.3, pp.403-410, 1990.
DOI : 10.1016/S0022-2836(05)80360-2

. Benson, GenBank, Nucleic Acids Research, vol.33, issue.Database issue, pp.34-38, 2005.
DOI : 10.1093/nar/gki063

E. J. Spence, K. Stevens, N. Sutton, L. Szajkowski, C. L. Tregidgo et al., Accurate whole human genome sequencing using reversible terminator chemistry, Nature, vol.456, pp.53-59, 2008.

. Boissinot, Rapid Exonuclease Digestion of PCR-Amplified Targets for Improved Microarray Hybridization, Clinical Chemistry, vol.53, issue.11, pp.2020-2023, 2007.
DOI : 10.1373/clinchem.2007.091157

. Boyce, Global phytoplankton decline over the past century, Nature, vol.104, issue.7306, pp.591-596, 2010.
DOI : 10.1038/nature09268

. Buckwalter, Dewatering microalgae by forward osmosis, Desalination, vol.312, pp.19-22, 2013.
DOI : 10.1016/j.desal.2012.12.015

. Burrows, . Wheeler, M. Burrows, and D. Wheeler, A block-sorting lossless data compression algorithm, 1994.

. Cannone, The comparative RNA web (CRW) site: An online database of comparative sequence and structure information for ribosomal, intron, and other RNAs, BMC Bioinformatics, vol.3, issue.1, p.15, 2002.
DOI : 10.1186/1471-2105-3-15

. Chacón, n-step FMindex for faster pattern matching, International Conference on Computational Science, ICCS, pp.70-79, 2013.

. Cheng, Hierarchical and Spatially Explicit Clustering of DNA Sequences with BAPS Software, Molecular Biology and Evolution, vol.30, issue.5, pp.1224-1228, 2013.
DOI : 10.1093/molbev/mst028

. Chevenet, Treedyn: towards dynamic graphics and annotations for analyses of trees, BMC Bioinformatics, vol.7, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00321061

. Chiaromonte, SCORING PAIRWISE GENOMIC SEQUENCE ALIGNMENTS, Biocomputing 2002, pp.115-141, 2002.
DOI : 10.1142/9789812799623_0012

. Chin, The Origin of the Haitian Cholera Outbreak Strain, New England Journal of Medicine, vol.364, issue.1, pp.33-42, 2011.
DOI : 10.1056/NEJMoa1012928

I. H. Consortium, Initial sequencing and analysis of the human genome, Nature, vol.409, pp.860-921, 2001.

. David, SHRiMP2: Sensitive yet Practical Short Read Mapping, Bioinformatics, vol.27, issue.7, pp.1011-1012, 2011.
DOI : 10.1093/bioinformatics/btr046

S. Dayhoff, M. O. Dayhoff, and R. M. Schwartz, Chapter 22: A model of evolutionary change in proteins, Atlas of Protein Sequence and Structure, 1978.

N. Dumas, J. Dumas, and J. Ninio, Efficient algorithms for folding and comparing nucleic acid sequences, Nucleic Acids Research, vol.10, issue.1, pp.197-206, 1982.
DOI : 10.1093/nar/10.1.197

. Durbin, Biological sequence analysis: probabilistic models of proteins and nucleic acids, chapter Pairwise alignment, 1998.
DOI : 10.1017/CBO9780511790492

S. R. Eddy, Accelerated Profile HMM Searches, PLoS Computational Biology, vol.21, issue.10, p.1002195, 2011.
DOI : 10.1371/journal.pcbi.1002195.g006

R. Edgar, Search and clustering orders of magnitude faster than BLAST, Bioinformatics, vol.26, issue.19, pp.2460-2461, 2010.
DOI : 10.1093/bioinformatics/btq461

S. D. Ehrlich, MetaHIT: The European Union Project on Metagenomics of the Human Intestinal Tract, pp.307-316, 2011.

. English, Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology, PLoS ONE, vol.15, issue.Suppl 1, 2012.
DOI : 10.1371/journal.pone.0047768.t003

M. Farrar, Striped Smith-Waterman speeds database searches six times over other SIMD implementations, Bioinformatics, vol.23, issue.2, pp.156-161, 2007.
DOI : 10.1093/bioinformatics/btl582

J. Felsenstein, Phylip -phylogeny inference package, Cladistics, vol.5, pp.164-166, 1989.

M. Ferragina, P. Ferragina, and G. Manzini, Opportunistic data structures with applications, Proceedings 41st Annual Symposium on Foundations of Computer Science, pp.390-398, 2000.
DOI : 10.1109/SFCS.2000.892127

. Fierera, Cross-biome metagenomic analyses of soil microbial communities and their functional attributes, Proceedings of the National Academy of Sciences, vol.109, issue.52, 2012.
DOI : 10.1073/pnas.1215210110

. Fleischmann, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd, Science, vol.269, issue.5223, pp.496-512, 1995.
DOI : 10.1126/science.7542800

. Flicek, Ensembl 2013, Nucleic Acids Research, vol.41, issue.D1, pp.48-55, 2013.
DOI : 10.1093/nar/gks1236

. Fonseca, Second-generation environmental sequencing unmasks marine metazoan biodiversity, Nature Communications, vol.21, issue.7, p.10, 1038.
DOI : 10.1038/ncomms1095

. Frith, Parameters for accurate genome alignment, BMC Bioinformatics, vol.11, issue.1, p.11, 2010.
DOI : 10.1073/pnas.2533904100

URL : http://doi.org/10.1186/1471-2105-11-80

. Goecks, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biology, vol.11, issue.8, pp.11-86, 2010.
DOI : 10.1186/gb-2010-11-8-r86

. Goffeau, Life with 6000 Genes, Science, vol.274, issue.5287, pp.563-567, 1996.
DOI : 10.1126/science.274.5287.546

. Gonnet, New indices for text: PAT trees and PAT arrays, Information Retrieval: Data Structures and Algorithms, pp.66-82, 1992.

. Gordon, An extreme value theory for long head runs, Probab Th Rel Fields, pp.279-287, 1986.
DOI : 10.1007/BF00699107

O. Gotoh, An improved algorithm for matching biological sequences, Journal of Molecular Biology, vol.162, issue.3, pp.705-708, 1982.
DOI : 10.1016/0022-2836(82)90398-9

O. N. Gridion, Oxford nanopore introduces DNA 'strand sequencing' on the high-throughput GridION platform and presents MinION, a sequencer the size of a USB memory stick, 2012.

D. Gusfield, Algorithms on Strings, Trees, and Sequences, 1997.
DOI : 10.1017/CBO9780511574931

. Hatem, Benchmarking short sequence mapping tools, BMC Bioinformatics, vol.14, pp.10-1186, 2013.

. Heinz, Burst tries: a fast, efficient data structure for string keys, ACM Transactions on Information Systems, vol.20, issue.2, pp.192-223, 2002.
DOI : 10.1145/506309.506312

H. Henikoff, S. Henikoff, and J. G. Henikoff, Amino acid substitution matrices from protein blocks., Proc Natl Acad Sci, 1992.
DOI : 10.1073/pnas.89.22.10915

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC50453/pdf

M. J. Heppner, The factor for bitterness in the sweet almond, Genetics, vol.8, pp.390-391, 1923.

. Hingamp, Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes, The ISME Journal, vol.37, issue.9, 2013.
DOI : 10.1186/1743-422X-9-161

URL : https://hal.archives-ouvertes.fr/hal-01258223

. Hoffmann, Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures, PLoS Computational Biology, vol.12, issue.9, 2009.
DOI : 10.1371/journal.pcbi.1000502.t001

. Holtgrewe, A novel and well-defined benchmarking method for second generation read mapping, BMC Bioinformatics, vol.12, issue.1, pp.10-1186, 2011.
DOI : 10.1186/1471-2105-9-11

. Homer, BFAST: An Alignment Tool for Large Scale Genome Resequencing, PLoS ONE, vol.5, issue.11, 2009.
DOI : 10.1371/journal.pone.0007767.s001

. Hopcroft, Chapter 2.3.5 Equivalence of deterministic and nondeterministic finite automata, 2004.

. Huang, Identification of ribosomal RNA genes in metagenomic fragments, Bioinformatics, vol.25, issue.10, pp.1338-1340, 2009.
DOI : 10.1093/bioinformatics/btp161

P. Iengar, An analysis of substitution, deletion and insertion mutations in cancer genes, Nucleic Acids Research, vol.40, issue.14, pp.6401-6413, 2012.
DOI : 10.1093/nar/gks290

. Jabbari, . Bernardi, K. Jabbari, and G. Bernardi, Cytosine methylation and CpG, TpG (CpA) and TpA frequencies, Gene, vol.333, pp.143-149, 2004.
DOI : 10.1016/j.gene.2004.02.043

A. Janda, J. Janda, and S. Abbott, 16S rRNA Gene Sequencing for Bacterial Identification in the Diagnostic Laboratory: Pluses, Perils, and Pitfalls, Journal of Clinical Microbiology, vol.45, issue.9, pp.2761-2764, 2007.
DOI : 10.1128/JCM.01228-07

. Kallmeyer, Global distribution of microbial abundance and biomass in subseafloor sediment, Proceedings of the National Academy of Sciences, vol.109, issue.40, pp.16213-16216, 2012.
DOI : 10.1073/pnas.1203849109

. Karlin, S. Karlin, and S. F. Altschul, Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes., Proceedings of the National Academy of Sciences, vol.87, issue.6, pp.2264-2268, 1990.
DOI : 10.1073/pnas.87.6.2264

. Khosravi, Comparing the genomes of Helicobacter pylori clinical strain UM032 and Mice-adapted derivatives, Gut Pathogens, vol.5, issue.1, pp.10-1186, 2013.
DOI : 10.1074/jbc.270.30.17771

. Kim, . Kececioglu, E. Kim, and J. Kececioglu, Inverse Sequence Alignment from Partial Examples, Algorithms in Bioinformatics, 7th International Workshop (WABI), pp.359-370, 2007.
DOI : 10.1007/978-3-540-74126-8_33

. Kopylova, Deciperhing metatranscriptomic data, Methods in Molecular Biology, p.page, 2013.

. Kopylova, SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data, Bioinformatics, vol.28, issue.24, pp.3211-3217, 2012.
DOI : 10.1093/bioinformatics/bts611

URL : https://hal.archives-ouvertes.fr/hal-00748990

. Korf, Chapter 4 Sequence Similarity, pp.55-71, 2003.

. Kumar, . Filipski, S. Kumar, and A. Filipski, Multiple sequence alignment: In pursuit of homologous DNA positions, Genome Research, vol.17, issue.2, pp.127-135, 2007.
DOI : 10.1101/gr.5232407

S. Kurtz, Reducing the space requirement of suffix trees, Software: Practice and Experience, vol.15, issue.13, pp.1149-1171, 1999.
DOI : 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O

. Kurtz, Versatile and open software for comparing large genomes, Genome Biol, vol.5, pp.10-1186, 2004.

. Kypr, Nucleotide composition bias and CpG dinucleotide content in the genomes of HIV and HTLV, Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression, vol.1009, issue.3, pp.280-282, 1989.
DOI : 10.1016/0167-4781(89)90114-0

. Lam, Compressed indexing and local alignment of DNA, Bioinformatics, vol.24, issue.6, pp.791-797, 2008.
DOI : 10.1093/bioinformatics/btn032

. Langmead, . Salzberg, B. Langmead, and S. Salzberg, Fast gapped-read alignment with Bowtie 2, Nature Methods, vol.9, issue.4, pp.357-359, 2012.
DOI : 10.1093/bioinformatics/btp352

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322381

. Langmead, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology, vol.10, issue.3, pp.10-1186, 2009.
DOI : 10.1186/gb-2009-10-3-r25

. Lee, rRNASelector: A computer program for selecting ribosomal RNA encoding sequences from metagenomic and metatranscriptomic shotgun libraries, The Journal of Microbiology, vol.21, issue.4, pp.689-91, 2011.
DOI : 10.1007/s12275-011-1213-z

. Leimena, A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets, BMC Genomics, vol.14, issue.1, p.530, 2013.
DOI : 10.1093/bib/bbs046

. Lemay, Novel genomic resources for a climate change sensitive mammal: characterization of the American pika transcriptome, BMC Genomics, vol.14, issue.1, pp.10-1186, 2013.
DOI : 10.1006/jmbi.2000.4315

. Letsch, The Impact of rRNA Secondary Structure Consideration in Alignment and Tree Reconstruction: Simulated Data and a Case Study on the Phylogeny of Hexapods, Molecular Biology and Evolution, vol.27, issue.11, pp.2507-2521, 2010.
DOI : 10.1093/molbev/msq140

. Levene, Zero-Mode Waveguides for Single-Molecule Analysis at High Concentrations, Science, vol.299, issue.5607, pp.682-686, 2003.
DOI : 10.1126/science.1079700

H. Li, Whole genome simulation, 2012.

. Li, . Durbin, H. Li, and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, vol.25, issue.14, pp.1754-60, 2009.
DOI : 10.1093/bioinformatics/btp324

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705234

H. Li, H. Li, and N. Homer, A survey of sequence alignment algorithms for next-generation sequencing, Briefings in Bioinformatics, vol.11, issue.5, pp.473-483, 2010.
DOI : 10.1093/bib/bbq015

. Li, Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Research, vol.18, issue.11, pp.1851-1858, 2008.
DOI : 10.1101/gr.078212.108

. Li, SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics, vol.25, issue.15, pp.1966-1967, 2009.
DOI : 10.1093/bioinformatics/btp336

P. Lipman, D. Lipman, and W. Pearson, Rapid and sensitive protein similarity searches, Science, vol.227, issue.4693, pp.1435-1441, 1985.
DOI : 10.1126/science.2983426

M. Lisi, Some remarks on the cantor pairing function, pp.55-65, 2007.

. Loman, Performance comparison of benchtop high-throughput sequencing platforms, Nature Biotechnology, vol.8, issue.5, pp.434-439, 2012.
DOI : 10.1371/journal.pgen.1000344

. Ludwig, ARB: a software environment for sequence data, Nucleic Acids Research, vol.32, issue.4, pp.1363-1371, 2004.
DOI : 10.1093/nar/gkh293

. Lupski, . Stankiewics, J. Lupski, and P. Stankiewics, Genomic Disorders: Molecular Mechanisms for Rearrangements and Conveyed Phenotypes, PLoS Genetics, vol.13, issue.6, p.49, 2005.
DOI : 0044-7897(1994)060[0073:BGMAS]2.0.CO;2

. Ma, PatternHunter: faster and more sensitive homology search, Bioinformatics, vol.18, issue.3, pp.440-445, 2002.
DOI : 10.1093/bioinformatics/18.3.440

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.8001

. Ma, Reconstruction of phyletic trees by global alignment of multiple metabolic networks, BMC Bioinformatics, vol.14, issue.Suppl 2, p.12, 2013.
DOI : 10.1186/1471-2148-5-23

M. Manber, U. Manber, and E. Myers, Suffix Arrays: A New Method for On-Line String Searches, SIAM Journal on Computing, vol.22, issue.5, pp.935-948, 1993.
DOI : 10.1137/0222058

. Marco-sola, The GEM mapper: fast, accurate and versatile alignment by filtration, Nature Methods, vol.485, issue.12, pp.1185-1188, 2012.
DOI : 10.1093/bioinformatics/btp352

. Mashayekhi, . Ronaghi, F. Mashayekhi, and M. Ronaghi, Analysis of read length limiting factors in Pyrosequencing chemistry, Analytical Biochemistry, vol.363, issue.2, pp.275-287, 2007.
DOI : 10.1016/j.ab.2007.02.002

E. Mccreight, A Space-Economical Suffix Tree Construction Algorithm, Journal of the ACM, vol.23, issue.2, pp.262-272, 1976.
DOI : 10.1145/321941.321946

. Mears, Modeling a Minimal Ribosome Based on Comparative Sequence Analysis, Journal of Molecular Biology, vol.321, issue.2, pp.215-249, 2002.
DOI : 10.1016/S0022-2836(02)00568-5

. Meek, OASIS, VLDB, pp.910-921, 2003.
DOI : 10.1016/B978-012722442-8/50085-9

S. Mihov and K. Schulz, Fast Approximate Search in Large Dictionaries, Computational Linguistics, vol.22, issue.1, pp.451-477, 2004.
DOI : 10.1002/spe.4380250307

P. Mitankin, Universal Levenshtein automata. building and properties, 2005.

. Morozova, Survival of Methanogenic Archaea from Siberian Permafrost under Simulated Martian Thermal Conditions, Origins of Life and Evolution of Biospheres, vol.9, issue.6, pp.189-200, 2007.
DOI : 10.1007/s11084-006-9024-7

. Nakamura, Sequence-specific error profile of Illumina sequencers, Nucleic Acids Research, vol.39, issue.13, 2011.
DOI : 10.1093/nar/gkr344

. Nakamura, Metagenomic Diagnosis of Bacterial Infections, Emerging Infectious Diseases, vol.14, issue.11, pp.1784-1786, 2008.
DOI : 10.3201/eid1411.080589

G. Navarro and R. Baeza-yates, A hybrid indexing method for approximate string matching, J of Discrete Algorithms, vol.1, pp.205-239, 2000.

G. Navarro and V. Mäkinen, Compressed full-text indexes, ACM Computing Surveys, vol.39, issue.1, 2007.
DOI : 10.1145/1216370.1216372

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.144.186

. Nawrocki, Infernal 1.0: inference of RNA alignments, Bioinformatics, vol.25, issue.10, pp.1335-1342, 2009.
DOI : 10.1093/bioinformatics/btp157

W. Needleman, S. Needleman, and C. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, vol.48, issue.3, pp.443-453, 1970.
DOI : 10.1016/0022-2836(70)90057-4

. Newton, Pathogenesis, parasitism and mutualism in the trophic space of microbe???plant interactions, Trends in Microbiology, vol.18, issue.8, pp.365-373, 2010.
DOI : 10.1016/j.tim.2010.06.002

. Nicholsona, Complete Genome Sequence of Serratia liquefaciens Strain ATCC 27592, Genome Announcements, vol.1, issue.4, pp.548-561, 2013.
DOI : 10.1128/genomeA.00548-13

P. Nyrén, The History of Pyrosequencing??, Methods Mol Biol, vol.373, pp.1-14, 2007.
DOI : 10.1007/978-1-4939-2715-9_1

. Nystedt, The Norway spruce genome sequence and conifer genome evolution, Nature, vol.101, issue.7451, pp.579-584, 2013.
DOI : 10.1038/nature12211

. Olsen, Rapid assessment of extremal statistics for gapped local alignment, Proc Int Conf Intell Syst Mol Biol, pp.211-222, 1999.

S. Ouyang and C. R. Buell, The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants, Nucleic Acids Research, vol.32, issue.90001, pp.360-363, 2004.
DOI : 10.1093/nar/gkh099

. Park, Estimating the Gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times, The Annals of Statistics, vol.37, issue.6A, p.3697, 2009.
DOI : 10.1214/08-AOS663

W. Pearson, Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms, Genomics, vol.11, issue.3, pp.635-650, 1991.
DOI : 10.1016/0888-7543(91)90071-L

L. ]. Pearson, W. Pearson, and D. Lipman, Improved tools for biological sequence comparison., Proceedings of the National Academy of Sciences, vol.85, issue.8, pp.2444-2448, 1988.
DOI : 10.1073/pnas.85.8.2444

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC280013/pdf

. Pisanti, RISOTTO: Fast Extraction of Motifs with Mismatches, Proceedings of the 7th Latin American Theoretical Informatics Symposium, 2006.
DOI : 10.1007/11682462_69

URL : https://hal.archives-ouvertes.fr/hal-00428023

. Pruesse, SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB, Nucleic Acids Research, vol.35, issue.21, pp.7188-7196, 2007.
DOI : 10.1093/nar/gkm864

. Pruitt, NCBI Reference Sequences: current status, policy and new initiatives, Nucleic Acids Research, vol.37, issue.Database, pp.32-36, 2009.
DOI : 10.1093/nar/gkn721

URL : http://doi.org/10.1093/nar/gkn721

. Puglisi, A taxonomy of suffix array construction algorithms, ACM Computing Surveys, vol.39, issue.2, pp.1-31, 2007.
DOI : 10.1145/1242471.1242472

. Qi, CVTree: a phylogenetic tree reconstruction tool based on whole genomes, Nucleic Acids Research, vol.32, issue.Web Server, pp.45-52, 2004.
DOI : 10.1093/nar/gkh362

. Qin, A human gut microbial gene catalogue established by metagenomic sequencing, Nature, vol.13, issue.7285, pp.59-65, 2009.
DOI : 10.1038/nature08821

URL : https://hal.archives-ouvertes.fr/cea-00908974

. Reis, CMPH: C Minimal Perfect Hashing library, 2012.

. Richter, MetaSim???A Sequencing Simulator for Genomics and Metagenomics, PLoS ONE, vol.13, issue.7, p.3373, 2008.
DOI : 10.1371/journal.pone.0003373.s002

. Roberts, The advantages of SMRT sequencing, Genome Biology, vol.11, issue.6, p.405, 2013.
DOI : 10.1186/1471-2105-11-21

T. R. Rognes, Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation, BMC Bioinformatics, vol.12, issue.1, p.221, 2011.
DOI : 10.1016/j.ygeno.2010.03.001

URL : http://doi.org/10.1186/1471-2105-12-221

. Ronaghi, Real-Time DNA Sequencing Using Detection of Pyrophosphate Release, Analytical Biochemistry, vol.242, issue.1, pp.84-89, 1996.
DOI : 10.1006/abio.1996.0432

. Ronaghi, DNA SEQUENCING:A Sequencing Method Based on Real-Time Pyrophosphate, Science, vol.281, issue.5375, pp.363-365, 1998.
DOI : 10.1126/science.281.5375.363

. Rothberg, An integrated semiconductor device enabling non-optical genome sequencing, Nature, vol.32, issue.7, pp.348-352, 2011.
DOI : 10.1038/nature10242

R. , B. Russell, R. Barton, and G. , Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels, Proteins, vol.14, pp.309-323, 1992.

. Russo, Approximate String Matching with Compressed Indexes, Algorithms, vol.2, issue.3, pp.1105-1136, 2009.
DOI : 10.3390/a2031105

U. Saitou, N. Saitou, and S. Ueda, Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of Primates, Mol Biol and Evol, vol.11, pp.504-512, 1994.

S. Salama, R. Salama, and D. Stekel, A non-independent energy-based multiple sequence alignment improves prediction of transcription factor binding sites, Bioinformatics, vol.29, issue.21, pp.10-1093, 2013.
DOI : 10.1093/bioinformatics/btt463

. Sanger, Nucleotide sequence of bacteriophage ??X174 DNA, Nature, vol.3, issue.5596, pp.687-695, 1977.
DOI : 10.1016/0042-6822(75)90198-1

. Schmieder, Identification and removal of ribosomal RNA sequences from metatranscriptomes, Bioinformatics, vol.28, issue.3, pp.433-435, 2012.
DOI : 10.1093/bioinformatics/btr669

K. Schulz and S. Mihov, Fast string correction with Levenshtein automata, International Journal on Document Analysis and Recognition, vol.5, issue.1, pp.67-85, 2002.
DOI : 10.1007/s10032-002-0082-8

H. Segata, N. Segata, and C. Huttenhower, Toward an Efficient Method of Identifying Core Genes for Evolutionary and Functional Microbial Phylogenies, PLoS ONE, vol.452, issue.9, 2011.
DOI : 10.1371/journal.pone.0024704.s004

. Sheetlin, The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment, Nucleic Acids Research, vol.33, issue.15, pp.4987-4994, 2005.
DOI : 10.1093/nar/gki800

Z. Sinha, R. Sinha, and J. Zobel, Cache-conscious sorting of large sets of strings with dynamic tries, Journal of Experimental Algorithmics, vol.9, issue.es, 2004.
DOI : 10.1145/1005813.1041517

. Sinha, Cache-efficient string sorting using copying, Journal of Experimental Algorithmics, vol.11, p.11, 2006.
DOI : 10.1145/1187436.1187439

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.3498

. Smith, Fluorescence detection in automated DNA sequence analysis, Nature, vol.13, issue.6071, pp.674-679, 1986.
DOI : 10.1038/321674a0

. Smith, . Waterman, . Smith, and M. Waterman, Identification of common molecular subsequences, Journal of Molecular Biology, vol.147, issue.1, pp.195-197, 1981.
DOI : 10.1016/0022-2836(81)90087-5

R. Sorek and P. Cossart, Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity, Nature Reviews Genetics, vol.290, issue.1, pp.9-16, 2010.
DOI : 10.1038/nrg2695

R. Staden, A strategy of DNA sequencing employing computer programs, Nucleic Acids Research, vol.6, issue.7, 1979.
DOI : 10.1093/nar/6.7.2601

. States, Improved sensitivity of nucleic acid database searches using application-specific scoring matrices, Methods, vol.3, issue.1, pp.66-70, 1991.
DOI : 10.1016/S1046-2023(05)80165-3

B. Sved, J. Sved, and A. Bird, The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model., Proceedings of the National Academy of Sciences, vol.87, issue.12, pp.4692-4696, 1990.
DOI : 10.1073/pnas.87.12.4692

. Turnbaugh, The Human Microbiome Project, Nature, vol.112, issue.7164, pp.804-810, 2007.
DOI : 10.1038/nature06244

E. Ukkonen, On-line construction of suffix trees, Algorithmica, vol.10, issue.3, pp.249-260, 1995.
DOI : 10.1007/BF01206331

. Venter, The Sequence of the Human Genome, Science, vol.291, issue.5507, pp.1304-1351, 2001.
DOI : 10.1126/science.1058040

URL : https://hal.archives-ouvertes.fr/hal-00465088

. Walser, CpG dinucleotides and the mutation rate of non-CpG DNA, Genome Research, vol.18, issue.9, pp.1403-1414, 2008.
DOI : 10.1101/gr.076455.108

. Wandelt, Data management challenges in next generation sequencing. Datenbank-Spektrum, pp.161-171, 2012.

S. Wang, K. Wang, and R. Samudrala, Incorporating background frequency improves entropy-based residue conservation measures, BMC Bioinformatics, vol.7, issue.1, p.385, 2006.
DOI : 10.1186/1471-2105-7-385

V. Waterman, M. S. Waterman, and M. Vingron, Rapid and accurate estimates of statistical significance for sequence data base searches., Proceedings of the National Academy of Sciences, vol.91, issue.11, pp.4625-4628, 1994.
DOI : 10.1073/pnas.91.11.4625

P. Weiner, Linear pattern matching algorithms, 14th Annual Symposium on Switching and Automata Theory (swat 1973), pp.1-11, 1973.
DOI : 10.1109/SWAT.1973.13

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.474.9582

L. ]. Wilbur, W. Wilbur, and D. Lipman, Rapid similarity searches of nucleic acid and protein data banks., Proceedings of the National Academy of Sciences, vol.80, issue.3, pp.726-730, 1983.
DOI : 10.1073/pnas.80.3.726

. Wommack, Metagenomics: Read Length Matters, Applied and Environmental Microbiology, vol.74, issue.5, pp.1453-1463, 2008.
DOI : 10.1128/AEM.02181-07

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2258652

Y. , Y. Yang, Z. Yoder, and A. , Estimation of the transition/transversion rate bias and species sampling, J Mol Evol, vol.48, pp.274-283, 1999.

. Yi, Duplex-specific nuclease efficiently removes rRNA for prokaryotic RNA-seq, Nucleic Acids Research, vol.39, issue.20, 2011.
DOI : 10.1093/nar/gkr617

. Zhao, SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications, PLoS ONE, vol.21, issue.12, 2012.
DOI : 10.1371/journal.pone.0082138.s001

B. Zhao, Z. Zhao, and E. Boerwinkle, Neighboring-Nucleotide Effects on Single Nucleotide Polymorphisms: A Study of 2.6 Million Polymorphisms Across the Human Genome, Genome Research, vol.12, issue.11, pp.1679-1686, 2002.
DOI : 10.1101/gr.287302

L. Lalunaebella, -2) and local (3-4) alignments for strings, p.16

. Durbin, A diagram of the relationships between the three states used for affine gap alignment, 1998.

{. $. , $. , $. , $. , $. et al., A preorder traversal of this suffix tree beginning from the root node (marked as start) yields all suffices of x in lexicographical order, being To search for all occurrences of a string, we begin at the root node and follow the edges that match to the characters of our string. The string (or at least its prefix) exists in the tree if we exhaust all of the characters before or at a leaf node. For example, if we search the string s = ata, we will finish at the inner node marked with [6,2]. The green dashed path links together all leaf nodes in lexicographical order and the [x,y] label at each inner node (except the root) gives the first and last position of a leaf node reachable from the current inner node. Both of these are optional as they are only useful for finding all of the locations at which s occurs (other methods exist too) To find all positions at which s occurs, we descend to the first lexicographically least leaf node and output its position (being 6), The leaf nodes hold the starting position Then we follow the paths linking the leaf nodes and output their positions until we reach the last position (being 2), p.34

. Verrucomicrobia, 21 Chloroflexi_1, 6 Candidate division TM7, and 9 Lentisphaerae, p.59