Learning Automata on Protein Sequences - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2006

Learning Automata on Protein Sequences

Résumé

Pattern discovery is limited to position-specific characterizations like Prosite's patterns or profile-HMMs which are unable to handle, for instance, dependencies between amino acids distant in the sequence of a protein, but close in its three-dimensional structure. To overcome these limitations, we propose to learn automata on proteins. Inspired by grammatical inference and multiple alignment techniques, we introduce a sequence-driven approach based on the idea of merging ordered partial local multiple alignments (PLMA) under preservation or consistency constraints and on an identification of informative positions with respect to physico-chemical properties . The quality of the characterization is asserted experimentally on two difficult sets of proteins by a comparison with (semi)-manually designed patterns of Prosite and with state-of-the-art pattern discovery algorithms. Further leave-one-out experimentations show that learning more precise automata allows to gain in accuracy by increasing the classification margins.
Fichier principal
Vignette du fichier
coste_kerbellec_jobim06.pdf (403.58 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00180429 , version 1 (19-10-2007)

Identifiants

  • HAL Id : inria-00180429 , version 1

Citer

François Coste, Goulven Kerbellec. Learning Automata on Protein Sequences. JOBIM, Jul 2006, Bordeaux, France. pp.199--210. ⟨inria-00180429⟩
184 Consultations
259 Téléchargements

Partager

Gmail Facebook X LinkedIn More