Replacing suffix trees with enhanced suffix arrays, Journal of Discrete Algorithms, vol.2, issue.1, pp.53-86, 2004. ,
DOI : 10.1016/S1570-8667(03)00065-0
DNAlignTT: Pairwise DNA alignment with sequence specific transition-transversion ratio, 2008 IEEE International Conference on Electro/Information Technology, pp.457-459, 2008. ,
DOI : 10.1109/EIT.2008.4554345
Plasmodium falciparum Isolates in India Exhibit a Progressive Increase in Mutations Associated with Sulfadoxine-Pyrimethamine Resistance, Antimicrobial Agents and Chemotherapy, vol.48, issue.3, pp.879-889, 2004. ,
DOI : 10.1128/AAC.48.3.879-889.2004
[27] Local alignment statistics, Methods Enzymol, vol.266, pp.460-480, 1996. ,
DOI : 10.1016/S0076-6879(96)66029-7
Basic local alignment search tool, Journal of Molecular Biology, vol.215, issue.3, pp.403-410, 1990. ,
DOI : 10.1016/S0022-2836(05)80360-2
GenBank, Nucleic Acids Research, vol.33, issue.Database issue, pp.34-38, 2005. ,
DOI : 10.1093/nar/gki063
Accurate whole human genome sequencing using reversible terminator chemistry, Nature, vol.456, pp.53-59, 2008. ,
Rapid Exonuclease Digestion of PCR-Amplified Targets for Improved Microarray Hybridization, Clinical Chemistry, vol.53, issue.11, pp.2020-2023, 2007. ,
DOI : 10.1373/clinchem.2007.091157
Global phytoplankton decline over the past century, Nature, vol.104, issue.7306, pp.591-596, 2010. ,
DOI : 10.1038/nature09268
Dewatering microalgae by forward osmosis, Desalination, vol.312, pp.19-22, 2013. ,
DOI : 10.1016/j.desal.2012.12.015
A block-sorting lossless data compression algorithm, 1994. ,
The comparative RNA web (CRW) site: An online database of comparative sequence and structure information for ribosomal, intron, and other RNAs, BMC Bioinformatics, vol.3, issue.1, p.15, 2002. ,
DOI : 10.1186/1471-2105-3-15
n-step FMindex for faster pattern matching, International Conference on Computational Science, ICCS, pp.70-79, 2013. ,
Hierarchical and Spatially Explicit Clustering of DNA Sequences with BAPS Software, Molecular Biology and Evolution, vol.30, issue.5, pp.1224-1228, 2013. ,
DOI : 10.1093/molbev/mst028
Treedyn: towards dynamic graphics and annotations for analyses of trees, BMC Bioinformatics, vol.7, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00321061
SCORING PAIRWISE GENOMIC SEQUENCE ALIGNMENTS, Biocomputing 2002, pp.115-141, 2002. ,
DOI : 10.1142/9789812799623_0012
The Origin of the Haitian Cholera Outbreak Strain, New England Journal of Medicine, vol.364, issue.1, pp.33-42, 2011. ,
DOI : 10.1056/NEJMoa1012928
Initial sequencing and analysis of the human genome, Nature, vol.409, pp.860-921, 2001. ,
SHRiMP2: Sensitive yet Practical Short Read Mapping, Bioinformatics, vol.27, issue.7, pp.1011-1012, 2011. ,
DOI : 10.1093/bioinformatics/btr046
Chapter 22: A model of evolutionary change in proteins, Atlas of Protein Sequence and Structure, 1978. ,
Efficient algorithms for folding and comparing nucleic acid sequences, Nucleic Acids Research, vol.10, issue.1, pp.197-206, 1982. ,
DOI : 10.1093/nar/10.1.197
Biological sequence analysis: probabilistic models of proteins and nucleic acids, chapter Pairwise alignment, 1998. ,
DOI : 10.1017/CBO9780511790492
Accelerated Profile HMM Searches, PLoS Computational Biology, vol.21, issue.10, p.1002195, 2011. ,
DOI : 10.1371/journal.pcbi.1002195.g006
Search and clustering orders of magnitude faster than BLAST, Bioinformatics, vol.26, issue.19, pp.2460-2461, 2010. ,
DOI : 10.1093/bioinformatics/btq461
MetaHIT: The European Union Project on Metagenomics of the Human Intestinal Tract, pp.307-316, 2011. ,
Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology, PLoS ONE, vol.15, issue.Suppl 1, 2012. ,
DOI : 10.1371/journal.pone.0047768.t003
Striped Smith-Waterman speeds database searches six times over other SIMD implementations, Bioinformatics, vol.23, issue.2, pp.156-161, 2007. ,
DOI : 10.1093/bioinformatics/btl582
Phylip -phylogeny inference package, Cladistics, vol.5, pp.164-166, 1989. ,
Opportunistic data structures with applications, Proceedings 41st Annual Symposium on Foundations of Computer Science, pp.390-398, 2000. ,
DOI : 10.1109/SFCS.2000.892127
Cross-biome metagenomic analyses of soil microbial communities and their functional attributes, Proceedings of the National Academy of Sciences, vol.109, issue.52, 2012. ,
DOI : 10.1073/pnas.1215210110
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd, Science, vol.269, issue.5223, pp.496-512, 1995. ,
DOI : 10.1126/science.7542800
Ensembl 2013, Nucleic Acids Research, vol.41, issue.D1, pp.48-55, 2013. ,
DOI : 10.1093/nar/gks1236
Second-generation environmental sequencing unmasks marine metazoan biodiversity, Nature Communications, vol.21, issue.7, p.10, 1038. ,
DOI : 10.1038/ncomms1095
Parameters for accurate genome alignment, BMC Bioinformatics, vol.11, issue.1, p.11, 2010. ,
DOI : 10.1073/pnas.2533904100
URL : http://doi.org/10.1186/1471-2105-11-80
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biology, vol.11, issue.8, pp.11-86, 2010. ,
DOI : 10.1186/gb-2010-11-8-r86
Life with 6000 Genes, Science, vol.274, issue.5287, pp.563-567, 1996. ,
DOI : 10.1126/science.274.5287.546
New indices for text: PAT trees and PAT arrays, Information Retrieval: Data Structures and Algorithms, pp.66-82, 1992. ,
An extreme value theory for long head runs, Probab Th Rel Fields, pp.279-287, 1986. ,
DOI : 10.1007/BF00699107
An improved algorithm for matching biological sequences, Journal of Molecular Biology, vol.162, issue.3, pp.705-708, 1982. ,
DOI : 10.1016/0022-2836(82)90398-9
Oxford nanopore introduces DNA 'strand sequencing' on the high-throughput GridION platform and presents MinION, a sequencer the size of a USB memory stick, 2012. ,
Algorithms on Strings, Trees, and Sequences, 1997. ,
DOI : 10.1017/CBO9780511574931
Benchmarking short sequence mapping tools, BMC Bioinformatics, vol.14, pp.10-1186, 2013. ,
Burst tries: a fast, efficient data structure for string keys, ACM Transactions on Information Systems, vol.20, issue.2, pp.192-223, 2002. ,
DOI : 10.1145/506309.506312
Amino acid substitution matrices from protein blocks., Proc Natl Acad Sci, 1992. ,
DOI : 10.1073/pnas.89.22.10915
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC50453/pdf
The factor for bitterness in the sweet almond, Genetics, vol.8, pp.390-391, 1923. ,
Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes, The ISME Journal, vol.37, issue.9, 2013. ,
DOI : 10.1186/1743-422X-9-161
URL : https://hal.archives-ouvertes.fr/hal-01258223
Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures, PLoS Computational Biology, vol.12, issue.9, 2009. ,
DOI : 10.1371/journal.pcbi.1000502.t001
A novel and well-defined benchmarking method for second generation read mapping, BMC Bioinformatics, vol.12, issue.1, pp.10-1186, 2011. ,
DOI : 10.1186/1471-2105-9-11
BFAST: An Alignment Tool for Large Scale Genome Resequencing, PLoS ONE, vol.5, issue.11, 2009. ,
DOI : 10.1371/journal.pone.0007767.s001
Chapter 2.3.5 Equivalence of deterministic and nondeterministic finite automata, 2004. ,
Identification of ribosomal RNA genes in metagenomic fragments, Bioinformatics, vol.25, issue.10, pp.1338-1340, 2009. ,
DOI : 10.1093/bioinformatics/btp161
An analysis of substitution, deletion and insertion mutations in cancer genes, Nucleic Acids Research, vol.40, issue.14, pp.6401-6413, 2012. ,
DOI : 10.1093/nar/gks290
Cytosine methylation and CpG, TpG (CpA) and TpA frequencies, Gene, vol.333, pp.143-149, 2004. ,
DOI : 10.1016/j.gene.2004.02.043
16S rRNA Gene Sequencing for Bacterial Identification in the Diagnostic Laboratory: Pluses, Perils, and Pitfalls, Journal of Clinical Microbiology, vol.45, issue.9, pp.2761-2764, 2007. ,
DOI : 10.1128/JCM.01228-07
Global distribution of microbial abundance and biomass in subseafloor sediment, Proceedings of the National Academy of Sciences, vol.109, issue.40, pp.16213-16216, 2012. ,
DOI : 10.1073/pnas.1203849109
Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes., Proceedings of the National Academy of Sciences, vol.87, issue.6, pp.2264-2268, 1990. ,
DOI : 10.1073/pnas.87.6.2264
Comparing the genomes of Helicobacter pylori clinical strain UM032 and Mice-adapted derivatives, Gut Pathogens, vol.5, issue.1, pp.10-1186, 2013. ,
DOI : 10.1074/jbc.270.30.17771
Inverse Sequence Alignment from Partial Examples, Algorithms in Bioinformatics, 7th International Workshop (WABI), pp.359-370, 2007. ,
DOI : 10.1007/978-3-540-74126-8_33
Deciperhing metatranscriptomic data, Methods in Molecular Biology, p.page, 2013. ,
SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data, Bioinformatics, vol.28, issue.24, pp.3211-3217, 2012. ,
DOI : 10.1093/bioinformatics/bts611
URL : https://hal.archives-ouvertes.fr/hal-00748990
Chapter 4 Sequence Similarity, pp.55-71, 2003. ,
Multiple sequence alignment: In pursuit of homologous DNA positions, Genome Research, vol.17, issue.2, pp.127-135, 2007. ,
DOI : 10.1101/gr.5232407
Reducing the space requirement of suffix trees, Software: Practice and Experience, vol.15, issue.13, pp.1149-1171, 1999. ,
DOI : 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
Versatile and open software for comparing large genomes, Genome Biol, vol.5, pp.10-1186, 2004. ,
Nucleotide composition bias and CpG dinucleotide content in the genomes of HIV and HTLV, Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression, vol.1009, issue.3, pp.280-282, 1989. ,
DOI : 10.1016/0167-4781(89)90114-0
Compressed indexing and local alignment of DNA, Bioinformatics, vol.24, issue.6, pp.791-797, 2008. ,
DOI : 10.1093/bioinformatics/btn032
Fast gapped-read alignment with Bowtie 2, Nature Methods, vol.9, issue.4, pp.357-359, 2012. ,
DOI : 10.1093/bioinformatics/btp352
URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322381
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology, vol.10, issue.3, pp.10-1186, 2009. ,
DOI : 10.1186/gb-2009-10-3-r25
rRNASelector: A computer program for selecting ribosomal RNA encoding sequences from metagenomic and metatranscriptomic shotgun libraries, The Journal of Microbiology, vol.21, issue.4, pp.689-91, 2011. ,
DOI : 10.1007/s12275-011-1213-z
A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets, BMC Genomics, vol.14, issue.1, p.530, 2013. ,
DOI : 10.1093/bib/bbs046
Novel genomic resources for a climate change sensitive mammal: characterization of the American pika transcriptome, BMC Genomics, vol.14, issue.1, pp.10-1186, 2013. ,
DOI : 10.1006/jmbi.2000.4315
The Impact of rRNA Secondary Structure Consideration in Alignment and Tree Reconstruction: Simulated Data and a Case Study on the Phylogeny of Hexapods, Molecular Biology and Evolution, vol.27, issue.11, pp.2507-2521, 2010. ,
DOI : 10.1093/molbev/msq140
Zero-Mode Waveguides for Single-Molecule Analysis at High Concentrations, Science, vol.299, issue.5607, pp.682-686, 2003. ,
DOI : 10.1126/science.1079700
Whole genome simulation, 2012. ,
Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, vol.25, issue.14, pp.1754-60, 2009. ,
DOI : 10.1093/bioinformatics/btp324
URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705234
A survey of sequence alignment algorithms for next-generation sequencing, Briefings in Bioinformatics, vol.11, issue.5, pp.473-483, 2010. ,
DOI : 10.1093/bib/bbq015
Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Research, vol.18, issue.11, pp.1851-1858, 2008. ,
DOI : 10.1101/gr.078212.108
SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics, vol.25, issue.15, pp.1966-1967, 2009. ,
DOI : 10.1093/bioinformatics/btp336
Rapid and sensitive protein similarity searches, Science, vol.227, issue.4693, pp.1435-1441, 1985. ,
DOI : 10.1126/science.2983426
Some remarks on the cantor pairing function, pp.55-65, 2007. ,
Performance comparison of benchtop high-throughput sequencing platforms, Nature Biotechnology, vol.8, issue.5, pp.434-439, 2012. ,
DOI : 10.1371/journal.pgen.1000344
ARB: a software environment for sequence data, Nucleic Acids Research, vol.32, issue.4, pp.1363-1371, 2004. ,
DOI : 10.1093/nar/gkh293
Genomic Disorders: Molecular Mechanisms for Rearrangements and Conveyed Phenotypes, PLoS Genetics, vol.13, issue.6, p.49, 2005. ,
DOI : 0044-7897(1994)060[0073:BGMAS]2.0.CO;2
PatternHunter: faster and more sensitive homology search, Bioinformatics, vol.18, issue.3, pp.440-445, 2002. ,
DOI : 10.1093/bioinformatics/18.3.440
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.8001
Reconstruction of phyletic trees by global alignment of multiple metabolic networks, BMC Bioinformatics, vol.14, issue.Suppl 2, p.12, 2013. ,
DOI : 10.1186/1471-2148-5-23
Suffix Arrays: A New Method for On-Line String Searches, SIAM Journal on Computing, vol.22, issue.5, pp.935-948, 1993. ,
DOI : 10.1137/0222058
The GEM mapper: fast, accurate and versatile alignment by filtration, Nature Methods, vol.485, issue.12, pp.1185-1188, 2012. ,
DOI : 10.1093/bioinformatics/btp352
Analysis of read length limiting factors in Pyrosequencing chemistry, Analytical Biochemistry, vol.363, issue.2, pp.275-287, 2007. ,
DOI : 10.1016/j.ab.2007.02.002
A Space-Economical Suffix Tree Construction Algorithm, Journal of the ACM, vol.23, issue.2, pp.262-272, 1976. ,
DOI : 10.1145/321941.321946
Modeling a Minimal Ribosome Based on Comparative Sequence Analysis, Journal of Molecular Biology, vol.321, issue.2, pp.215-249, 2002. ,
DOI : 10.1016/S0022-2836(02)00568-5
OASIS, VLDB, pp.910-921, 2003. ,
DOI : 10.1016/B978-012722442-8/50085-9
Fast Approximate Search in Large Dictionaries, Computational Linguistics, vol.22, issue.1, pp.451-477, 2004. ,
DOI : 10.1002/spe.4380250307
Universal Levenshtein automata. building and properties, 2005. ,
Survival of Methanogenic Archaea from Siberian Permafrost under Simulated Martian Thermal Conditions, Origins of Life and Evolution of Biospheres, vol.9, issue.6, pp.189-200, 2007. ,
DOI : 10.1007/s11084-006-9024-7
Sequence-specific error profile of Illumina sequencers, Nucleic Acids Research, vol.39, issue.13, 2011. ,
DOI : 10.1093/nar/gkr344
Metagenomic Diagnosis of Bacterial Infections, Emerging Infectious Diseases, vol.14, issue.11, pp.1784-1786, 2008. ,
DOI : 10.3201/eid1411.080589
A hybrid indexing method for approximate string matching, J of Discrete Algorithms, vol.1, pp.205-239, 2000. ,
Compressed full-text indexes, ACM Computing Surveys, vol.39, issue.1, 2007. ,
DOI : 10.1145/1216370.1216372
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.144.186
Infernal 1.0: inference of RNA alignments, Bioinformatics, vol.25, issue.10, pp.1335-1342, 2009. ,
DOI : 10.1093/bioinformatics/btp157
A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, vol.48, issue.3, pp.443-453, 1970. ,
DOI : 10.1016/0022-2836(70)90057-4
Pathogenesis, parasitism and mutualism in the trophic space of microbe???plant interactions, Trends in Microbiology, vol.18, issue.8, pp.365-373, 2010. ,
DOI : 10.1016/j.tim.2010.06.002
Complete Genome Sequence of Serratia liquefaciens Strain ATCC 27592, Genome Announcements, vol.1, issue.4, pp.548-561, 2013. ,
DOI : 10.1128/genomeA.00548-13
The History of Pyrosequencing??, Methods Mol Biol, vol.373, pp.1-14, 2007. ,
DOI : 10.1007/978-1-4939-2715-9_1
The Norway spruce genome sequence and conifer genome evolution, Nature, vol.101, issue.7451, pp.579-584, 2013. ,
DOI : 10.1038/nature12211
Rapid assessment of extremal statistics for gapped local alignment, Proc Int Conf Intell Syst Mol Biol, pp.211-222, 1999. ,
The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants, Nucleic Acids Research, vol.32, issue.90001, pp.360-363, 2004. ,
DOI : 10.1093/nar/gkh099
Estimating the Gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times, The Annals of Statistics, vol.37, issue.6A, p.3697, 2009. ,
DOI : 10.1214/08-AOS663
Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms, Genomics, vol.11, issue.3, pp.635-650, 1991. ,
DOI : 10.1016/0888-7543(91)90071-L
Improved tools for biological sequence comparison., Proceedings of the National Academy of Sciences, vol.85, issue.8, pp.2444-2448, 1988. ,
DOI : 10.1073/pnas.85.8.2444
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC280013/pdf
RISOTTO: Fast Extraction of Motifs with Mismatches, Proceedings of the 7th Latin American Theoretical Informatics Symposium, 2006. ,
DOI : 10.1007/11682462_69
URL : https://hal.archives-ouvertes.fr/hal-00428023
SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB, Nucleic Acids Research, vol.35, issue.21, pp.7188-7196, 2007. ,
DOI : 10.1093/nar/gkm864
NCBI Reference Sequences: current status, policy and new initiatives, Nucleic Acids Research, vol.37, issue.Database, pp.32-36, 2009. ,
DOI : 10.1093/nar/gkn721
URL : http://doi.org/10.1093/nar/gkn721
A taxonomy of suffix array construction algorithms, ACM Computing Surveys, vol.39, issue.2, pp.1-31, 2007. ,
DOI : 10.1145/1242471.1242472
CVTree: a phylogenetic tree reconstruction tool based on whole genomes, Nucleic Acids Research, vol.32, issue.Web Server, pp.45-52, 2004. ,
DOI : 10.1093/nar/gkh362
A human gut microbial gene catalogue established by metagenomic sequencing, Nature, vol.13, issue.7285, pp.59-65, 2009. ,
DOI : 10.1038/nature08821
URL : https://hal.archives-ouvertes.fr/cea-00908974
CMPH: C Minimal Perfect Hashing library, 2012. ,
MetaSim???A Sequencing Simulator for Genomics and Metagenomics, PLoS ONE, vol.13, issue.7, p.3373, 2008. ,
DOI : 10.1371/journal.pone.0003373.s002
The advantages of SMRT sequencing, Genome Biology, vol.11, issue.6, p.405, 2013. ,
DOI : 10.1186/1471-2105-11-21
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation, BMC Bioinformatics, vol.12, issue.1, p.221, 2011. ,
DOI : 10.1016/j.ygeno.2010.03.001
URL : http://doi.org/10.1186/1471-2105-12-221
Real-Time DNA Sequencing Using Detection of Pyrophosphate Release, Analytical Biochemistry, vol.242, issue.1, pp.84-89, 1996. ,
DOI : 10.1006/abio.1996.0432
DNA SEQUENCING:A Sequencing Method Based on Real-Time Pyrophosphate, Science, vol.281, issue.5375, pp.363-365, 1998. ,
DOI : 10.1126/science.281.5375.363
An integrated semiconductor device enabling non-optical genome sequencing, Nature, vol.32, issue.7, pp.348-352, 2011. ,
DOI : 10.1038/nature10242
Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels, Proteins, vol.14, pp.309-323, 1992. ,
Approximate String Matching with Compressed Indexes, Algorithms, vol.2, issue.3, pp.1105-1136, 2009. ,
DOI : 10.3390/a2031105
Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of Primates, Mol Biol and Evol, vol.11, pp.504-512, 1994. ,
A non-independent energy-based multiple sequence alignment improves prediction of transcription factor binding sites, Bioinformatics, vol.29, issue.21, pp.10-1093, 2013. ,
DOI : 10.1093/bioinformatics/btt463
Nucleotide sequence of bacteriophage ??X174 DNA, Nature, vol.3, issue.5596, pp.687-695, 1977. ,
DOI : 10.1016/0042-6822(75)90198-1
Identification and removal of ribosomal RNA sequences from metatranscriptomes, Bioinformatics, vol.28, issue.3, pp.433-435, 2012. ,
DOI : 10.1093/bioinformatics/btr669
Fast string correction with Levenshtein automata, International Journal on Document Analysis and Recognition, vol.5, issue.1, pp.67-85, 2002. ,
DOI : 10.1007/s10032-002-0082-8
Toward an Efficient Method of Identifying Core Genes for Evolutionary and Functional Microbial Phylogenies, PLoS ONE, vol.452, issue.9, 2011. ,
DOI : 10.1371/journal.pone.0024704.s004
The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment, Nucleic Acids Research, vol.33, issue.15, pp.4987-4994, 2005. ,
DOI : 10.1093/nar/gki800
Cache-conscious sorting of large sets of strings with dynamic tries, Journal of Experimental Algorithmics, vol.9, issue.es, 2004. ,
DOI : 10.1145/1005813.1041517
Cache-efficient string sorting using copying, Journal of Experimental Algorithmics, vol.11, p.11, 2006. ,
DOI : 10.1145/1187436.1187439
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.3498
Fluorescence detection in automated DNA sequence analysis, Nature, vol.13, issue.6071, pp.674-679, 1986. ,
DOI : 10.1038/321674a0
Identification of common molecular subsequences, Journal of Molecular Biology, vol.147, issue.1, pp.195-197, 1981. ,
DOI : 10.1016/0022-2836(81)90087-5
Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity, Nature Reviews Genetics, vol.290, issue.1, pp.9-16, 2010. ,
DOI : 10.1038/nrg2695
A strategy of DNA sequencing employing computer programs, Nucleic Acids Research, vol.6, issue.7, 1979. ,
DOI : 10.1093/nar/6.7.2601
Improved sensitivity of nucleic acid database searches using application-specific scoring matrices, Methods, vol.3, issue.1, pp.66-70, 1991. ,
DOI : 10.1016/S1046-2023(05)80165-3
The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model., Proceedings of the National Academy of Sciences, vol.87, issue.12, pp.4692-4696, 1990. ,
DOI : 10.1073/pnas.87.12.4692
The Human Microbiome Project, Nature, vol.112, issue.7164, pp.804-810, 2007. ,
DOI : 10.1038/nature06244
On-line construction of suffix trees, Algorithmica, vol.10, issue.3, pp.249-260, 1995. ,
DOI : 10.1007/BF01206331
The Sequence of the Human Genome, Science, vol.291, issue.5507, pp.1304-1351, 2001. ,
DOI : 10.1126/science.1058040
URL : https://hal.archives-ouvertes.fr/hal-00465088
CpG dinucleotides and the mutation rate of non-CpG DNA, Genome Research, vol.18, issue.9, pp.1403-1414, 2008. ,
DOI : 10.1101/gr.076455.108
Data management challenges in next generation sequencing. Datenbank-Spektrum, pp.161-171, 2012. ,
Incorporating background frequency improves entropy-based residue conservation measures, BMC Bioinformatics, vol.7, issue.1, p.385, 2006. ,
DOI : 10.1186/1471-2105-7-385
Rapid and accurate estimates of statistical significance for sequence data base searches., Proceedings of the National Academy of Sciences, vol.91, issue.11, pp.4625-4628, 1994. ,
DOI : 10.1073/pnas.91.11.4625
Linear pattern matching algorithms, 14th Annual Symposium on Switching and Automata Theory (swat 1973), pp.1-11, 1973. ,
DOI : 10.1109/SWAT.1973.13
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.474.9582
Rapid similarity searches of nucleic acid and protein data banks., Proceedings of the National Academy of Sciences, vol.80, issue.3, pp.726-730, 1983. ,
DOI : 10.1073/pnas.80.3.726
Metagenomics: Read Length Matters, Applied and Environmental Microbiology, vol.74, issue.5, pp.1453-1463, 2008. ,
DOI : 10.1128/AEM.02181-07
URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2258652
Estimation of the transition/transversion rate bias and species sampling, J Mol Evol, vol.48, pp.274-283, 1999. ,
Duplex-specific nuclease efficiently removes rRNA for prokaryotic RNA-seq, Nucleic Acids Research, vol.39, issue.20, 2011. ,
DOI : 10.1093/nar/gkr617
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications, PLoS ONE, vol.21, issue.12, 2012. ,
DOI : 10.1371/journal.pone.0082138.s001
Neighboring-Nucleotide Effects on Single Nucleotide Polymorphisms: A Study of 2.6 Million Polymorphisms Across the Human Genome, Genome Research, vol.12, issue.11, pp.1679-1686, 2002. ,
DOI : 10.1101/gr.287302
-2) and local (3-4) alignments for strings, p.16 ,
A diagram of the relationships between the three states used for affine gap alignment, 1998. ,
A preorder traversal of this suffix tree beginning from the root node (marked as start) yields all suffices of x in lexicographical order, being To search for all occurrences of a string, we begin at the root node and follow the edges that match to the characters of our string. The string (or at least its prefix) exists in the tree if we exhaust all of the characters before or at a leaf node. For example, if we search the string s = ata, we will finish at the inner node marked with [6,2]. The green dashed path links together all leaf nodes in lexicographical order and the [x,y] label at each inner node (except the root) gives the first and last position of a leaf node reachable from the current inner node. Both of these are optional as they are only useful for finding all of the locations at which s occurs (other methods exist too) To find all positions at which s occurs, we descend to the first lexicographically least leaf node and output its position (being 6), The leaf nodes hold the starting position Then we follow the paths linking the leaf nodes and output their positions until we reach the last position (being 2), p.34 ,
21 Chloroflexi_1, 6 Candidate division TM7, and 9 Lentisphaerae, p.59 ,