Robust Seriation and Applications to Cancer Genomics

Abstract : The seriation problem seeks to reorder a set of elements given pairwise similarity information, so that elements with higher similarity are closer in the resulting sequence. When a global ordering consistent with the similarity information exists, an exact spectral solution recovers it in the noiseless case and seriation is equivalent to the combinatorial 2-SUM problem over permutations, for which several relaxations have been derived. However, in applications such as DNA assembly, similarity values are often heavily corrupted, and the solution of 2-SUM may no longer yield an approximate serial structure on the elements. We introduce the robust seriation problem and show that it is equivalent to a modified 2-SUM problem for a class of similarity matrices modeling those observed in DNA assembly. We explore several relaxations of this modified 2-SUM problem and compare them empirically on both synthetic matrices and real DNA data. We then introduce the problem of seriation with duplications, which is a generalization of Seriation motivated by applications to cancer genome reconstruction. We propose an algorithm involving robust seriation to solve it, and present preliminary results on synthetic data sets.
Document type :
Preprints, Working Papers, ...
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01851960
Contributor : Antoine Recanati <>
Submitted on : Tuesday, July 31, 2018 - 12:47:48 PM
Last modification on : Friday, April 19, 2019 - 4:54:50 PM

Links full text

Identifiers

  • HAL Id : hal-01851960, version 1
  • ARXIV : 1806.00664

Citation

Antoine Recanati, Nicolas Servant, Jean-Philippe Vert, Alexandre d'Aspremont. Robust Seriation and Applications to Cancer Genomics. 2018. ⟨hal-01851960⟩

Share

Metrics

Record views

150