On the Approximability of Comparing Genomes with Duplicates

Abstract : A central problem in comparative genomics consists in computing a (dis-)simi- larity measure between two genomes, e.g. in order to construct a phylogenetic tree. A large number of such measures has been proposed in the recent past: number of reversals, number of breakpoints, number of common or conserved intervals, SAD etc. In their initial definitions, all these measures suppose that genomes contain no duplicates. However, we now know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. Then, after a gene relabeling according to this matching and a deletion of the unmatched signed genes, two genomes without duplicates are obtained and the measure can be computed. In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar model, maximum matching model and non maximum matching model). We prove that, for each model and each measure, computing a matching between two genomes that optimizes the measure is APX-Hard. We show that this result remains true even for two genomes G1 and G2 such that G1 contains no duplicates and no gene of G2 appears more than twice. Therefore, our results extend those of [5–7]. Finally, we propose a 4-approximation algorithm for a measure closely related to the number of breakpoints, the number of adjacencies, under the maximum matching model, in the case where genomes contain the same number of duplications of each gene.
Type de document :
Communication dans un congrès
Springer-Verlag. 2nd Workshop on Algorithms and Computation (WALCOM 2008), 2008, Dhaka, Bangladesh. Springer-Verlag, Lecture Notes in Computer Science (LNCS) (4921), pp.34-45, 2008, Lecture Notes in Computer Science (LNCS)
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00416492
Contributeur : Guillaume Fertin <>
Soumis le : mardi 15 septembre 2009 - 10:23:56
Dernière modification le : jeudi 5 avril 2018 - 10:36:49
Document(s) archivé(s) le : mardi 15 juin 2010 - 20:44:50

Fichier

WALCOM08.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00416492, version 1

Collections

Citation

Sébastien Angibaud, Guillaume Fertin, Irena Rusu. On the Approximability of Comparing Genomes with Duplicates. Springer-Verlag. 2nd Workshop on Algorithms and Computation (WALCOM 2008), 2008, Dhaka, Bangladesh. Springer-Verlag, Lecture Notes in Computer Science (LNCS) (4921), pp.34-45, 2008, Lecture Notes in Computer Science (LNCS). 〈hal-00416492〉

Partager

Métriques

Consultations de la notice

148

Téléchargements de fichiers

91