Computing Genomic Distances: An Algorihtmic Viewpoint

Abstract : Comparative genomics is a field of bioinformatics in which the goal is to compare several species by comparing their genomes, in order to understand how the different species under study have evolved in time. This study leads for instance to reconstructing putative ancestral genomes, building phylogenetic trees, or inferring the functionality of genes or sets of genes. One of the main activities in comparative genomics consists in comparing pairs of genomes, in order to identify their common features, and thus also to determine what differentiate them. In that case, genomes are usually modeled as sequences of genes, where a gene is identified by a (possibly signed) label. The sign + or -, if present, indicates on which DNA strand the gene lies. In that context, the order of the genes in the studied genomes is the main information we are given. Note that the way this order was obtained is out of our scope here: only the order itself is taken into account. It should also be noted that genomes may contain several occurrences of the same gene (possibly carrying different signs, if signs are present). In that case, we say that a genome contains duplicates. Indeed, genes may be duplicated during evolution, and duplicate genes actually occur frequently in all living species. Comparing pairs of genomes on that basis can roughly be done in two different ways: 1. Compare the structure of the two genomes under study by computing a measure that represents the (dis)similarity between the genomes. 2. Infer the evolution process from one genome to another. For this, one needs to consider one or several operations (called rearrangement(s)) that can occur in a genome during evolution, e.g. inversions or translo- cations ; and the goal is to determine the most parsimonious (i.e., less costly) rearrangement scenario that leads from one genome to the other. In this chapter, we only focus on option 1. above. This static viewpoint has the advantage to allow us to identify conserved regions between genomes, which is not the case with option 2. Note also that, although the term distance is very often used for option 1. (as is done in the title of this chapter), this only refers to evolutionary distance, i.e. the amount of changes that occurred during the evolution process. Indeed, the so-called "distances" that have been defined in the literature are rarely mathematical distances: they are measures that evaluate the differences and similarities resulting from evolution between the two genomes, either by directly counting the number of changes or, in a complementary way, by counting the conserved regions. Hence, in the following, we use the term measure rather than distance. The purpose of this chapter is to present some algorithmic aspects of pair- wise genome comparisons, when those comparisons aim at finding a (dis)si- milarity measure. More precisely, we present several algorithms that were proposed recently for solving (exactly or approximately) several variants of the problem. Our goal is not to survey exhaustively all the existing results on that topic, but rather to give a sample of different algorithmic ideas and tech- niques that have been used to answer some of the problems. Besides the fact that it presents original and non trivial concepts that we think are of interest for the reader, it also gives a flavor of the inventiveness and the richness of recent research on the subject.
Liste complète des métadonnées

Cited literature [12 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00606146
Contributor : Guillaume Fertin <>
Submitted on : Tuesday, July 5, 2011 - 3:10:18 PM
Last modification on : Thursday, April 5, 2018 - 10:36:48 AM
Document(s) archivé(s) le : Thursday, October 6, 2011 - 2:25:06 AM

File

Fertin-Rusu-Chapter.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Guillaume Fertin, Irena Rusu. Computing Genomic Distances: An Algorihtmic Viewpoint. Wiley Science. Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, Wiley Science, pp.773-798, 2011, 〈10.1002/9780470892107.ch34〉. 〈hal-00606146〉

Share

Metrics

Record views

234

Files downloads

204