Basic local alignment search tool, Journal of Molecular Biology, vol.215, issue.3, p.403410, 1990. ,
DOI : 10.1016/S0022-2836(05)80360-2
EFICAz2 : enzyme function inference by a combined approach enhanced by machine learning, BMC Bioinformatics, vol.10, issue.11, 2009. ,
The threedimensional structures of two ? -agarases, Journal of Biological Chemistry, issue.47, p.2784717147180, 2003. ,
Predicting Protein Secondary Structure Using Stochastic Tree Grammars, Machine Learning, vol.29, issue.2-3, p.275301, 1997. ,
Speeding Up the DIALIGN Multiple Alignment Program by Using thèGreedy Alignment of BIOlogical Sequences LIBrary' (GABIOS-LIB), Computational Biology, p.111, 2001. ,
Inductive inference of formal languages from positive data, Information and Control, vol.45, issue.2, p.117135, 1980. ,
DOI : 10.1016/S0019-9958(80)90285-5
Inference of Reversible Languages, J. ACM, vol.29, issue.3, p.741765, 1982. ,
Learning regular sets from queries and counterexamples. Information and computation, p.87106, 1987. ,
Structural SCOP Superfamily Level Classication Using Unsupervised Machine Learning, IEEE/ACM Trans. Comput. Biology Bioinform, vol.9, issue.2, p.601608, 2012. ,
Evolutionary Genomics of the HAD Superfamily: Understanding the Structural Adaptations and Catalytic Diversity in a Superfamily of Phosphoesterases and Allied Enzymes, Journal of Molecular Biology, vol.361, issue.5, p.36110031034, 2006. ,
DOI : 10.1016/j.jmb.2006.06.049
The ENZYME database in 2000, Nucleic Acids Research, vol.28, issue.1, p.304305, 2000. ,
DOI : 10.1093/nar/28.1.304
Bioinformatics : The Machine Learning Approach, 2001. ,
MEME Suite : tools for motif discovery and searching, Nucleic Acids Research, vol.37, issue.2, pp.202-208, 2009. ,
The pfam protein families database, Nucleic acids research, issue.1, pp.32-138, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-01294685
Answer set programming at a glance, Communications of the ACM, vol.54, issue.12, p.92103, 2011. ,
DOI : 10.1145/2043174.2043195
Automatic discovery of cross-family sequence features associated with protein function, BMC Bioinformatics, vol.7, issue.1, p.16, 2006. ,
DOI : 10.1186/1471-2105-7-16
Approaches to the Automatic Discovery of Patterns in Biosequences, Journal of Computational Biology, vol.5, issue.2, p.279305, 1998. ,
DOI : 10.1089/cmb.1998.5.279
Biclustering in data mining, Computers & Operations Research, vol.35, issue.9, p.29642987, 2008. ,
DOI : 10.1016/j.cor.2007.01.005
HMMSTR : a hidden Markov model for local sequence-structure correlations in proteins, Journal of molecular biology, vol.301, issue.1, p.173190, 2000. ,
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing, Algorithms, vol.4, issue.4, p.262284, 2011. ,
DOI : 10.3390/a4040262
URL : https://hal.archives-ouvertes.fr/inria-00638445
The Carbohydrate-Active En- Zymes database (CAZy) : an expert resource for glycogenomics, Nucleic acids research, vol.37, issue.1, pp.233-238, 2009. ,
Polynomial Identication in the Limit of Substitutable Context-free Languages, Journal of Machine Learning Research, vol.8, p.17251745, 2007. ,
Automated enzyme classication by formal concept analysis Local Substitutability for Sequence Generalization, Formal Concept Analysis ICGI 2012 Conference Proceedings, pp.235250-97111, 1957. ,
Aspects of the theory of syntax Grammatical Representations of Macromolecular Structure The relation between the divergence of sequence and structure in proteins The relation between the divergence of sequence and structure in proteins, Journal of Computational Biology The EMBO journal The EMBO journal, vol.13, issue.54, p.10771100823826823, 1964. ,
Unsupervised induction of stochastic context-free grammars using distributional clustering, Proceedings of the 2001 workshop on Computational Natural Language Learning , ConLL '01, p.13, 2001. ,
DOI : 10.3115/1117822.1117831
Combining distributional and morphological information for part of speech induction Distributional Learning of some Context-free Languages with a Minimally Adequate Teacher, Proceedings of the tenth conference on European chapter Grammatical Inference : Theoretical Results and Applications. Proceedings of the International Colloquium on Grammatical Inference, number 6339 in Lecture Notes in Computer Science, pp.5966-2437, 2003. ,
Learning Context Free Grammars with the Syntactic Concept Lattice, Grammatical Inference : Theoretical Results and Applications. Proceedings of the International Colloquium on Grammatical Inference, number 6339 in Lecture Notes in Computer Science, p.3851, 2010. ,
DOI : 10.1007/978-3-642-15488-1_5
A Language Theoretic Approach to Syntactic Structure, The Mathematics of Language, p.3956, 2011. ,
DOI : 10.1017/CBO9780511791222
Learning Trees from Strings : A Strong Learning Algorithm for some Context-Free Grammars, Journal of Machine Learning Research, vol.14, p.35373559, 2014. ,
Grammatical inference by hill climbing, Information Sciences, vol.10, issue.2, p.5980, 1976. ,
Simplicity : A unifying principle in cognitive science ? Trends in cognitive sciences Multiple alignment for structural, functional, or phylogenetic analyses of homologous sequences [Day73] Margaret Oakley Dayho Atlas of Protein Sequence and Structure : Supplement No. 1 ; Edited [by] MO Dayho Prediction of Enzyme Classication from Protein Sequence without the Use of Sequence Similarity Characteristic Sets for Polynomial Grammatical Inference Some classes of regular languages identiable in the limit from positive data A stochastic context free grammar based framework for analysis of protein sequences, Enzyme-specic proles for genome annotation : PRIAM. Nucleic acids research Bioinformatics : Sequence, Structure, and Databanks ISMB, pages 9299. AAAI, 1997. [dlH97] Colin de la Higuera Machine Learning Grammatical Inference : algorithms and applicationsDMV94] P. Dupont, L. Miclet, and E. Vidal. What is the search space of the regular inference ? In In Proceedings of the Second International Colloquium on Grammatical Inference (ICGI'94, pp.3166336639-5176125138, 1922. ,
Practical limits of function prediction Hmmer user's guide : biological sequence analysis using prole hidden markov models, EPS + 05 Improving Protein Function Prediction Using the Hierarchical Structure of the Gene Ontology. In CIBCB, pp.98107-354363, 1998. ,
Inférence grammaticale de langages hors-contextes, 2006. ,
SCOPe: Structural Classification of Proteins???extended, integrating SCOP and ASTRAL data and classification of new structures, Nucleic Acids Research, vol.42, issue.D1, pp.42-304, 2014. ,
DOI : 10.1093/nar/gkt1240
Inférence d'automates nis non déterministes par gestion de l'ambiguïté, en vue d'applications en bioinformatique, 2003. ,
Sequence divergence, functional constraint, and selection in protein evolution. Annual review of genomics and human genetics, p.213235, 2003. ,
Can sequence determine function ,
Evaluation and selection of biases in machine learning, Machine Learning, p.522, 1995. ,
From complete genome sequence tòtòcomplete' understanding ?, Trends in Biotechnology, vol.28, issue.8, p.398406, 2010. ,
Conict-driven answer set solving : From theory to practice, Artif. Intell, vol.187, p.5289, 2012. ,
Prole analysis : detection of distantly related proteins Clustering bipartite graphs in terms of approximate formal concepts and sub-contexts, Proceedings of the National Academy of Sciences, p.4355435811251142, 1987. ,
Language identication in the limit, Information and control, vol.10, issue.5, p.447474, 1967. ,
Complexity of Automaton Identication from Given Data, Information and Control, vol.37, issue.3, p.302320, 1978. ,
A minimum description length approach to grammar inference, Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language, p.203216, 1994. ,
DOI : 10.1007/3-540-60925-3_48
Correlated mutations and residue contacts in proteins, Proteins: Structure, Function, and Genetics, vol.262, issue.4, p.309317, 1994. ,
DOI : 10.1002/prot.340180402
Functional divergence in protein (family) sequence evolution, Genetica, vol.118, issue.2-3, p.133, 2003. ,
DOI : 10.1007/978-94-010-0229-5_4
Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition Local Languages, the Successor Method, and a Step Towards a General Methodology for the Inference of Regular Grammars. Pattern Analysis and Machine Intelligence, ) :841845, nov. 1987. [GVO90] P. Garcia, E. Vidal, and J. Oncina. Learning Locally Testable Languages in the Strict Sense First int. workshop on Algorithmic Learning theory, ALT'90, pp.920925-325338, 1990. ,
MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms, Nucleic Acids Research, vol.35, issue.Database, pp.515-520, 2007. ,
DOI : 10.1093/nar/gkl774
The PROSITE database, Nucleic Acids Research, vol.34, issue.90001, p.227230, 2006. ,
DOI : 10.1093/nar/gkj063
Transfer of carbohydrate-active enzymes from marine bacteria to Japanese gut microbiota, Nature, vol.5, issue.7290, pp.464908-912, 2010. ,
DOI : 10.1038/nature08937
Formal language theory and DNA : An analysis of the generative capacity of specic recombinant behaviours Amino acid substitution matrices from protein blocks, Proceedings of the National Academy of Sciences, pp.737759-891091510919, 1987. ,
Automated construction and graphical presentation of protein blocks from unaligned sequences, Gene, vol.163, 1995. ,
InterPro in 2011 : new developments in the family and domain prediction database, Nucleic Acids Research, issue.D1, pp.40-306, 2012. ,
Classication by Selecting Plausible Formal Concepts in a Concept Lattice, Workshop on Formal Concept Analysis meets Information Retrieval (FCAIR2013), p.2235, 2013. ,
Biocatalysis by Dehalogenating Enzymes, Advances in Applied Microbiology, vol.61, p.233252, 2007. ,
DOI : 10.1016/S0065-2164(06)61006-X
Scoring Function for Pattern Discovery Programs Taking Into Account Sequence Diversity, 1996. ,
Apprentissage d'automates modélisant des familles de séquences protéiques, 2008. ,
Evolution of protein structures and functions, Current opinion in structural biology, vol.12, issue.3, p.400408, 2002. ,
Application of a theory of enzyme specicity to protein synthesis, Proceedings of the National Academy of Sciences of the United States of America, vol.44, issue.2, p.98, 1958. ,
Learning deterministic even linear languages from positive examples, Theoretical Computer Science, vol.185, issue.1, p.6379, 1997. ,
Generating decision tree from lattice for classication, 7th International Conference on Applied Informatics, pp.377-384, 2007. ,
Genome-wide Analysis of Substrate Specicities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family, Journal of Biological Chemistry, issue.47, pp.28136149-36161, 2006. ,
Computer Analysis of Bacterial Haloacid Dehalogenases Denes a Large Superfamily of Hydrolases with Diverse Specicity : Application of an Iterative Approach to Database Search, Journal of Molecular Biology, vol.244, issue.1, p.125132, 1994. ,
The human genome project. Cracking the genetic code of life, 1991. ,
Random DFA's can be approximately learned from sparse uniform examples, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, p.4552, 1992. ,
DOI : 10.1145/130385.130390
Inférence grammaticale sur des alphabets ordonnés : Application à la découverte de motifs dans des familles de protéines, 2005. ,
Bayesian models for multiple local sequence alignment and gibbs sampling strategies, Journal of the American Statistical Association, issue.432, p.9011561170, 1995. ,
Mips : a database for genomes and protein sequences, Nucleic acids research, vol.99, issue.271, p.4448, 1999. ,
The need for biases in learning generalizations, 1980. ,
DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment, Bioinformatics, vol.15, issue.3, p.211218, 1999. ,
DOI : 10.1093/bioinformatics/15.3.211
Comparison of the pam and blosum amino acid substitution matrices, Cold Spring Harbor Protocols, issue.6, p.59, 2008. ,
The Ectocarpus genome and the independent evolution of multicellularity in brown algae, Nature, issue.7298, p.617621, 2010. ,
Clustering Sets of Objects Using Concepts-Objects Bipartite Graphs, Lecture Notes in Computer Science, vol.7520, p.420432, 2012. ,
DOI : 10.1007/978-3-642-33362-0_32
URL : https://hal.archives-ouvertes.fr/hal-00992046
A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, vol.48, issue.3, p.443453, 1970. ,
Inferring regular languages in polynomial update time, Pattern Recognition and Image Analysis, p.4961, 1992. ,
A helpful result for proving inherent ambiguity, Theory of Computing Systems, p.191194, 1968. ,
DOI : 10.1007/BF01694004
CATH : A Hierarchic Classication of Protein Domain Structures Using secondary structures to measure the geometry of a protein, Structure, issue.5, p.10931108, 1997. ,
Inductive Inference, DFAs, and Computational Complexity, Proceedings of International Workshop on Analogical and Inductive Inference (AII), p.1844, 1989. ,
IgTM: An algorithm to predict transmembrane domains and topology in proteins, BMC Bioinformatics, vol.9, issue.1, 2008. ,
DOI : 10.1186/1471-2105-9-367
Protein Motif Prediction by Grammatical Inference, Sakakibara et al. [SKS + 06, p.175187 ,
DOI : 10.1007/11872436_15
Generalized phrase structure grammars, head grammars and natural language, 1984. ,
Mémoire sur la diastase, les principaux produits de ses réactions ,
A large-scale evaluation of computational protein function prediction, 2013. ,
Modeling by the shortest data description Automation al. The funcat, a functional annotation scheme for systematic classication of proteins from whole genomes, Nucleic acids research, vol.14, issue.18, pp.465471-3255395545, 1978. ,
Learning Classication Rules Using Lattices, Lecture Notes in Computer Science, vol.912, p.343346, 1995. ,
RNA Modeling by Combining Stochastic Context-Free Grammars and n-Gram Models. IJPRAI ORCAE : online resource for community annotation of eukaryotes Stochastic Context-Free Grammars for Modeling RN, SBA + 12] Lieven Sterck, HICSS (5), pp.309-3151041, 1994. ,
A rapid method for determining sequences in dna by primed synthesis with dna polymerase EzyPred : a top-down approach for predicting enzyme functional classes and subclasses, Journal of molecular biology Biochem. Biophys. Res. Commun, vol.94, issue.3641, p.441448539, 1975. ,
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures, Proceedings of the seventh conference on European chapter of the Association for Computational LinguisticsSCP + 13] Ida, pp.490-498, 1995. ,
DOI : 10.1093/nar/gks1211
The language of genes, Nature, vol.10, issue.6912, p.211217, 2002. ,
DOI : 10.1038/29667
Enzymes with lid-gated active sites must operate by an induced t mechanism instead of conformational selection, Proceedings of the National Academy of Sciences, vol.105, issue.37, pp.13829-13834, 2008. ,
Unsupervised learning of natural languages, Proceedings of the National Academy of Sciences, vol.102, issue.33, p.1162911634, 2005. ,
DOI : 10.1073/pnas.0409746102
Sequence logos : a new way to display consensus sequences, Nucleic acids research, vol.18, issue.20, p.60976100, 1990. ,
Human HAD phosphatases: structure, mechanism, and roles in health and disease, FEBS Journal, vol.14, issue.Pt 2, p.549571, 2013. ,
DOI : 10.1111/j.1742-4658.2012.08633.x
Identication of common molecular subsequences, Journal of molecular biology, p.195197, 1981. ,
Enzyme Function Prediction with Interpretable Models, 2009. ,
DOI : 10.1007/978-1-59745-243-4_17
History of the enzyme nomenclature system, Bioinformatics, vol.16, issue.1, p.3440, 2000. ,
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Research, vol.22, issue.22, p.46734680, 1994. ,
DOI : 10.1093/nar/22.22.4673
Principles of risk minimization for learning theory In Advances in neural information processing systems ABL : Alignment-Based Learning, COLING 18, pp.831838-961967, 1992. ,
Molecular structure of nucleic acids, Nature, vol.171, issue.4356, p.737738, 1953. ,
The nomenclature of multiple enzyme forms, Experientia, vol.20, issue.10, p.592, 1964. ,
Enzyme nomenclature : a personal retrospective, The FASEB Journal, vol.7, issue.12, p.11921194, 1993. ,
Restructuring lattice theory : An approach based on hierarchies of concepts, Ordered Sets, p.445470, 1982. ,
Assessing annotation transfer for genomics : quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores, Journal of molecular biology, vol.297, issue.1, p.233249, 2000. ,
Closed-Label Concept Lattice Based Rule Extraction Approach, Lecture Notes in Computer Science, vol.1, issue.6, p.690698, 2011. ,
DOI : 10.1007/978-3-642-59830-2
Language acquisition and the discovery of phrase structure, Language and Speech, vol.23, issue.3, p.255269, 1980. ,
Evolution of Protein Sequences and Structures, Journal of Molecular Biology, vol.291, issue.4, p.977995, 1999. ,
SVM-based Method for Predicting Enzyme Function in a Hierarchical Context, The Fourth International Conference on Computational Systems Biology (ISB2010), p.119127, 2010. ,
Learning Local Languages and its Application to Protein ?-Chain Identication, HICSS (5), pp.113-122, 1994. ,
Identication in the Limit of (k,l)-Substitutable Context-Free Languages, Proceedings of the 9th international colloquium conference on Grammatical inference : theoretical results and applications, ICGI'08, p.266279, 2008. ,
Ecient learning of multiple context-free languages with multidimensional substitutability from positive data, Theoretical Computer Science, vol.412, issue.19, p.18211831, 2011. ,
The impact of nextgeneration sequencing on genomics, Journal of Genetics and Genomics, vol.38, issue.3, p.95109, 2011. ,
1, les chemins représentés par une suite d'arc représente une séquence et les blocs sont représentés par des rectangles comprenant certaines positions des séquences. La couleur des séquences représente leur appartenance à une classe donnée dénie comme suit : Agarase (EC ,