Polynomial Identification in the limit of context-free substitutable languages

Abstract : This paper formalises the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of context-free languages. We show that there is a polynomial characteristic set, and thus prove polynomial identification in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. It transpires that it is not necessary to identify constituents in order to learn a context-free language -- it is sufficient to identify the syntactic congruence, and the operations of the syntactic monoid can be converted into a context-free grammar. We also discuss modifications to the algorithm that produces a reduction system rather than a context-free grammar, that will be much more compact. We discuss the relationship to Angluin's notion of reversibility for regular languages. We also demonstrate that an implementation of this algorithm is capable of learning a classic example of structure dependent syntax in English: this constitutes a refutation of an argument that has been used in support of nativist theories of language.
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00186889
Contributor : Rémi Eyraud <>
Submitted on : Wednesday, November 14, 2007 - 10:38:42 AM
Last modification on : Thursday, June 27, 2019 - 1:36:06 PM
Long-term archiving on : Monday, April 12, 2010 - 1:58:35 AM

File

JMLR_clark_eyraud_07.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-00186889, version 1

Collections

Citation

A. Clark, Rémi Eyraud. Polynomial Identification in the limit of context-free substitutable languages. Journal of Machine Learning Research, Microtome Publishing, 2007, 8, pp.1725--1745. ⟨hal-00186889⟩

Share

Metrics

Record views

172

Files downloads

302