Designing RNA Secondary Structures is Hard

Edouard Bonnet; Paweł Rzążewski; Florian Sikora

Communication Dans Un Congrès Année : 2018

Designing RNA Secondary Structures is Hard

(1) , (2) , (3)

1
2
3

Edouard Bonnet

Fonction : Auteur
PersonId : 171728
IdHAL : edouard-bonnet
ORCID : 0000-0002-1653-5822
IdRef : 182698602

Modèles de calcul, Complexité, Combinatoire

Paweł Rzążewski

Fonction : Auteur

Faculty of Mathematics and Information Science [Warszawa]

Florian Sikora

Fonction : Auteur
PersonId : 742949
IdHAL : florian-sikora
ORCID : 0000-0003-2670-6258
IdRef : 158172590

Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision

Résumé

An RNA sequence is a word over an alphabet on four elements {A, C, G, U } called bases. RNA sequences fold into secondary structures where some bases pair with one another while others remain unpaired. Pseudoknot-free secondary structures can be represented as well-parenthesized expressions with additional dots, where pairs of matching parentheses symbolize paired bases and dots, unpaired bases. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some model of energy and to design sequences of bases which will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS 2016), whereas Lyngsø has shown it is NP-complete if pseudoknots are allowed (ICALP 2004). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Anne Condon (ICALP 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this paper we show that, in the simplest model of energy which is the Watson-Crick model the design of secondary structures is NP-complete if one adds natural constraints of the form: index i of the sequence has to be labeled by base b. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice (see for example the instances of the EteRNA project). Our reduction from a variant of 3-Sat has as main ingredients: arches of parentheses of different widths, a linear order interleaving variables and clauses, and an intended rematching strategy which increases the number of pairs iff the three literals of a same clause are false. The correctness of the construction is also quite intricate; it relies on the polynomial algorithm for the design of saturated structures-secondary structures without dots-by Haleš et al. (Algorithmica 2016), counting arguments, and a concise case analysis.

Domaines

Informatique [cs] Bio-informatique [q-bio.QM] Complexité [cs.CC]

Fichier principal

main.pdf (550.23 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Édouard Bonnet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01991541

Soumis le : mercredi 23 janvier 2019-21:58:12

Dernière modification le : vendredi 19 avril 2024-16:18:54

Dates et versions

hal-01991541 , version 1 (23-01-2019)

Identifiants

HAL Id : hal-01991541 , version 1

Citer

Edouard Bonnet, Paweł Rzążewski, Florian Sikora. Designing RNA Secondary Structures is Hard. RECOMB 2018, Apr 2018, Paris, France. ⟨hal-01991541⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS INRIA UNIV-LYON1 UNIV-DAUPHINE LAMSADE-DAUPHINE PSL UDL

73 Consultations

49 Téléchargements

Designing RNA Secondary Structures is Hard

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager