On the Approximability of Comparing Genomes with Duplicates

Sébastien Angibaud; Guillaume Fertin; Irena Rusu; Annelyse Thévenin; Stéphane Vialette

Article Dans Une Revue Journal of Graph Algorithms and Applications Année : 2009

On the Approximability of Comparing Genomes with Duplicates

(1) , (1) , (1) , (2) , (3)

1
2
3

Sébastien Angibaud

Fonction : Auteur

Laboratoire d'Informatique de Nantes Atlantique

Guillaume Fertin

Fonction : Auteur correspondant
PersonId : 11485
IdHAL : guillaume-fertin
ORCID : 0000-0002-8251-2012
IdRef : 095050612

Connectez-vous pour contacter l'auteur

Laboratoire d'Informatique de Nantes Atlantique

Irena Rusu

Fonction : Auteur
PersonId : 16772
IdHAL : irena
IdRef : 095050671

Laboratoire d'Informatique de Nantes Atlantique

Annelyse Thévenin

Fonction : Auteur

Laboratoire de Recherche en Informatique

Stéphane Vialette

Fonction : Auteur
PersonId : 3062
IdHAL : stephane-vialette
ORCID : 0000-0003-2308-6970
IdRef : 061620734

Laboratoire d'Informatique Gaspard-Monge

Résumé

A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogenetic tree. A large number of such measures has been proposed in the recent past: number of reversals, number of breakpoints, number of common or conserved intervals etc. In their initial definitions, all these measures suppose that genomes contain no duplicates. However, we now know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. Then, after a gene relabeling according to this matching and a deletion of the unmatched signed genes, two genomes without duplicates are obtained and the measure can be computed. In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar, intermediate and maximum matching models). We prove that, for each model and each measureM, computing a matching between two genomes that optimizes M is APX–hard. We show that this result remains true even for two genomes G1 and G2 such that G1 contains no duplicates and no gene of G2 appears more than twice. Therefore, our results extend those of [7, 10, 13]. Besides, in order to evaluate the possible existence of approximation algorithms concerning the number of breakpoints, we also study the complexity of the following decision problem: is there an exemplarization (resp. an intermediate matching, a maximum matching) that induces no breakpoint ? In particular, we extend a result of [13] by proving the problem to be NP–complete in the exemplar model for a new class of instances, we note that the problems are equivalent in the intermediate and the exemplar models and we show that the problem is in P in the maximum matching model. Finally, we focus on a fourth measure, closely related to the number of breakpoints: the number of adjacencies, for which we give several constant ratio approximation algorithms in the maximum matching model, in the case where genomes contain the same number of duplications of each gene.

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM] Algorithme et structure de données [cs.DS] Complexité [cs.CC]

Fichier principal

JGAA.pdf (365.88 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Fertin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00416440

Soumis le : lundi 14 septembre 2009-11:20:31

Dernière modification le : jeudi 28 mars 2024-03:28:14

Archivage à long terme le : mardi 15 juin 2010-21:51:33

Dates et versions

hal-00416440 , version 1 (14-09-2009)

Identifiants

HAL Id : hal-00416440 , version 1

Citer

Sébastien Angibaud, Guillaume Fertin, Irena Rusu, Annelyse Thévenin, Stéphane Vialette. On the Approximability of Comparing Genomes with Duplicates. Journal of Graph Algorithms and Applications, 2009, 13 (1), pp.19-53. ⟨hal-00416440⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES ENPC EC-PARIS CNRS UNIV-MLV LINA LINA-COMBI LIGM_ALGO PARISTECH LIGM LIGM_MOA UMR8623 LS2N UNIV-PARIS-SACLAY UNIV-EIFFEL NANTES-UNIVERSITE LIGM_ADA JSE2024

457 Consultations

127 Téléchargements

On the Approximability of Comparing Genomes with Duplicates

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager