Beyond Adjacency Maximization: Scaffold Filling for New String Distances

Abstract : In Genomic Scaffold Filling, one aims at polishing in silico a draft genome, called scaffold. The scaffold is given in the form of an ordered set of gene sequences, called contigs. This is done by confronting the scaffold to an already complete reference genome from a close species. More precisely, given a scaffold S, a reference genome G and a score function f () between two genomes, the aim is to complete S by adding the missing genes from G so that the obtained complete genome S * optimizes f (S * , G). In this paper, we extend a model of Jiang et al. [CPM 2016] (i) by allowing the insertions of strings instead of single characters (i.e., some groups of genes may be forced to be inserted together) and (ii) by considering two alternative score functions: the first generalizes the notion of common adjacencies by maximizing the number of common k-mers between S * and G (k-Mer Scaffold Filling), the second aims at minimizing the number of breakpoints between S * and G (Min-Breakpoint Scaffold Filling). We study these problems from the parameterized complexity point of view, providing fixed-parameter (FPT) algorithms for both problems. In particular, we show that k-Mer Scaffold Filling is FPT wrt. parameter , the number of additional k-mers realized by the completion of S—this answers an open question of Jiang et al. [CPM 2016]. We also show that Min-Breakpoint Scaffold Filling is FPT wrt. a parameter combining the number of missing genes, the number of gene repetitions and the target distance.
Liste complète des métadonnées

Cited literature [19 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01615671
Contributor : Laurent Bulteau <>
Submitted on : Thursday, October 12, 2017 - 4:26:00 PM
Last modification on : Tuesday, March 26, 2019 - 9:25:22 AM

File

LIPIcs-CPM-2017-27.pdf
Publisher files allowed on an open archive

Identifiers

Citation

Laurent Bulteau, Guillaume Fertin, Christian Komusiewicz. Beyond Adjacency Maximization: Scaffold Filling for New String Distances . 28th Annual Symposium on Combinatorial Pattern Matching, 2017, Warsaw, Poland. 28th Annual Symposium on Combinatorial Pattern Matching 〈10.4230/LIPIcs.CPM.2017.27〉. 〈hal-01615671〉

Share

Metrics

Record views

240

Files downloads

109