Genomes containing Duplicates are Hard to compare

Abstract : In this paper, we are interested in the algorithmic complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes. In that case, there are usually two main ways to compute a given (dis)similarity measure M between two genomes G1 and G2: the rst model, that we will call the matching model, consists in making a one-to-one correspondence between genes of G1 and genes of G2, in such a way that M is optimized. The second model, called the exemplar model, consists in keeping in G1 (resp. G2) exactly one copy of each gene, thus deleting all the other copies, in such a way that M is optimized. We present here dierent results concerning the algorithmic complexity of computing three dierent similarity measures (number of common intervals, MAD number and SAD number) in those two models, basically showing that the problem becomes NP-complete for each of them as soon as genomes contain duplicates. We show indeed that for common intervals, MAD and SAD, the problem is NP-complete when genes are duplicated in genomes, in both the exemplar and matching models. In the case of MAD and SAD, we actually prove that, under both models, both MAD and SAD problems are APX-hard
Liste complète des métadonnées

Cited literature [9 references]  Display  Hide  Download
Contributor : Guillaume Fertin <>
Submitted on : Thursday, September 17, 2009 - 4:41:28 PM
Last modification on : Wednesday, May 23, 2018 - 3:44:02 PM
Document(s) archivé(s) le : Tuesday, June 15, 2010 - 11:51:16 PM


Files produced by the author(s)


  • HAL Id : hal-00418260, version 1



Cedric Chauve, Guillaume Fertin, Romeo Rizzi, Stéphane Vialette. Genomes containing Duplicates are Hard to compare. International Workshop on Bioinformatics Research and Applications (IWBRA 2006), 2006, Reading, United Kingdom. pp.783-790. ⟨hal-00418260⟩



Record views


Files downloads