Beyond Adjacency Maximization: Scaffold Filling for New String Distances

Abstract : In Genomic Scaffold Filling, one aims at polishing in silico a draft genome, called scaffold. The scaffold is given in the form of an ordered set of gene sequences, called contigs. This is done by confronting the scaffold to an already complete reference genome from a close species. More precisely, given a scaffold S, a reference genome G and a score function f () between two genomes, the aim is to complete S by adding the missing genes from G so that the obtained complete genome S * optimizes f (S * , G). In this paper, we extend a model of Jiang et al. [CPM 2016] (i) by allowing the insertions of strings instead of single characters (i.e., some groups of genes may be forced to be inserted together) and (ii) by considering two alternative score functions: the first generalizes the notion of common adjacencies by maximizing the number of common k-mers between S * and G (k-Mer Scaffold Filling), the second aims at minimizing the number of breakpoints between S * and G (Min-Breakpoint Scaffold Filling). We study these problems from the parameterized complexity point of view, providing fixed-parameter (FPT) algorithms for both problems. In particular, we show that k-Mer Scaffold Filling is FPT wrt. parameter , the number of additional k-mers realized by the completion of S—this answers an open question of Jiang et al. [CPM 2016]. We also show that Min-Breakpoint Scaffold Filling is FPT wrt. a parameter combining the number of missing genes, the number of gene repetitions and the target distance.
Type de document :
Communication dans un congrès
28th Annual Symposium on Combinatorial Pattern Matching, 2017, Warsaw, Poland. 28th Annual Symposium on Combinatorial Pattern Matching 〈10.4230/LIPIcs.CPM.2017.27〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01615671
Contributeur : Laurent Bulteau <>
Soumis le : jeudi 12 octobre 2017 - 16:26:00
Dernière modification le : jeudi 5 juillet 2018 - 14:45:43

Fichier

LIPIcs-CPM-2017-27.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Citation

Laurent Bulteau, Guillaume Fertin, Christian Komusiewicz. Beyond Adjacency Maximization: Scaffold Filling for New String Distances . 28th Annual Symposium on Combinatorial Pattern Matching, 2017, Warsaw, Poland. 28th Annual Symposium on Combinatorial Pattern Matching 〈10.4230/LIPIcs.CPM.2017.27〉. 〈hal-01615671〉

Partager

Métriques

Consultations de la notice

231

Téléchargements de fichiers

104