Comparing Genomes with Duplications: a Computational Complexity Point of View

In this paper, we are interested in the computational complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods ( [1], [2]) have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G1 and G2: first, one establishes a one-to-one correspondence between genes of G1 and genes of G2 ; second, once this correspondence is established, it defines explicitly a permutation and it is then possible to quantify their similarity using classical measures defined for permutations, like the number of breakpoints. Hence these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes, and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, to compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy are deleted in both genomes for each gene family, and the matching model, that computes a maximal correspondence for each gene family. We show that for these two models, and for three (dis)similarity measures on permutations, namely the number of common intervals, the maximum adjacency disruption (MAD) number and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete, and even APX-hard for the MAD number and SAD number.

Mots clés

Comparative genomics computational complexity common intervals maximum adjacency disruption number summed adjacency disruption number

Domaines

Algorithme et structure de données [cs.DS] Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM] Complexité [cs.CC]

Fichier principal

DuplicatesTCBB.pdf (615.37 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Fertin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00417720

Soumis le : mercredi 16 septembre 2009-16:36:58

Dernière modification le : mercredi 3 avril 2024-11:42:03

Archivage à long terme le : mardi 15 juin 2010-20:04:22

Dates et versions

hal-00417720 , version 1 (16-09-2009)

Identifiants

HAL Id : hal-00417720 , version 1
DOI : 10.1109/TCBB.2007.1069

Citer

Guillaume Blin, Cedric Chauve, Guillaume Fertin, Romeo Rizzi, Stéphane Vialette. Comparing Genomes with Duplications: a Computational Complexity Point of View. ACM Transactions on Computational Logic, 2007, 4 (4), pp.523-534. ⟨10.1109/TCBB.2007.1069⟩. ⟨hal-00417720⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES ENPC CNRS UNIV-MLV LINA LINA-COMBI LIGM_ALGO PARISTECH LIGM LS2N UNIV-EIFFEL NANTES-UNIVERSITE JSE2024

235 Consultations

106 Téléchargements