Computing Genomic Distances: An Algorihtmic Viewpoint

Guillaume Fertin; Irena Rusu

doi:10.1002/9780470892107.ch34

Chapitre D'ouvrage Année : 2011

Computing Genomic Distances: An Algorihtmic Viewpoint

(1) , (1)

Guillaume Fertin

Fonction : Auteur correspondant
PersonId : 11485
IdHAL : guillaume-fertin
ORCID : 0000-0002-8251-2012
IdRef : 095050612

Connectez-vous pour contacter l'auteur

Laboratoire d'Informatique de Nantes Atlantique

Irena Rusu

Fonction : Auteur
PersonId : 16772
IdHAL : irena
IdRef : 095050671

Laboratoire d'Informatique de Nantes Atlantique

Résumé

Comparative genomics is a field of bioinformatics in which the goal is to compare several species by comparing their genomes, in order to understand how the different species under study have evolved in time. This study leads for instance to reconstructing putative ancestral genomes, building phylogenetic trees, or inferring the functionality of genes or sets of genes. One of the main activities in comparative genomics consists in comparing pairs of genomes, in order to identify their common features, and thus also to determine what differentiate them. In that case, genomes are usually modeled as sequences of genes, where a gene is identified by a (possibly signed) label. The sign + or -, if present, indicates on which DNA strand the gene lies. In that context, the order of the genes in the studied genomes is the main information we are given. Note that the way this order was obtained is out of our scope here: only the order itself is taken into account. It should also be noted that genomes may contain several occurrences of the same gene (possibly carrying different signs, if signs are present). In that case, we say that a genome contains duplicates. Indeed, genes may be duplicated during evolution, and duplicate genes actually occur frequently in all living species. Comparing pairs of genomes on that basis can roughly be done in two different ways: 1. Compare the structure of the two genomes under study by computing a measure that represents the (dis)similarity between the genomes. 2. Infer the evolution process from one genome to another. For this, one needs to consider one or several operations (called rearrangement(s)) that can occur in a genome during evolution, e.g. inversions or translo- cations ; and the goal is to determine the most parsimonious (i.e., less costly) rearrangement scenario that leads from one genome to the other. In this chapter, we only focus on option 1. above. This static viewpoint has the advantage to allow us to identify conserved regions between genomes, which is not the case with option 2. Note also that, although the term distance is very often used for option 1. (as is done in the title of this chapter), this only refers to evolutionary distance, i.e. the amount of changes that occurred during the evolution process. Indeed, the so-called "distances" that have been defined in the literature are rarely mathematical distances: they are measures that evaluate the differences and similarities resulting from evolution between the two genomes, either by directly counting the number of changes or, in a complementary way, by counting the conserved regions. Hence, in the following, we use the term measure rather than distance. The purpose of this chapter is to present some algorithmic aspects of pair- wise genome comparisons, when those comparisons aim at finding a (dis)si- milarity measure. More precisely, we present several algorithms that were proposed recently for solving (exactly or approximately) several variants of the problem. Our goal is not to survey exhaustively all the existing results on that topic, but rather to give a sample of different algorithmic ideas and tech- niques that have been used to answer some of the problems. Besides the fact that it presents original and non trivial concepts that we think are of interest for the reader, it also gives a flavor of the inventiveness and the richness of recent research on the subject.

Domaines

Complexité [cs.CC] Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM] Algorithme et structure de données [cs.DS]

Fichier principal

Fertin-Rusu-Chapter.pdf (286.15 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Fertin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00606146

Soumis le : mardi 5 juillet 2011-15:10:18

Dernière modification le : vendredi 5 janvier 2024-03:22:10

Archivage à long terme le : jeudi 6 octobre 2011-02:25:06

Dates et versions

hal-00606146 , version 1 (05-07-2011)

Identifiants

HAL Id : hal-00606146 , version 1
DOI : 10.1002/9780470892107.ch34

Citer

Guillaume Fertin, Irena Rusu. Computing Genomic Distances: An Algorihtmic Viewpoint. Wiley Science. Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, Wiley Science, pp.773-798, 2011, ⟨10.1002/9780470892107.ch34⟩. ⟨hal-00606146⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES CNRS LINA LINA-COMBI LS2N NANTES-UNIVERSITE

109 Consultations

205 Téléchargements

Computing Genomic Distances: An Algorihtmic Viewpoint

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager