4422 articles – 2353 Notices  [english version]
HAL : inria-00638445, version 1

Voir la fiche détaillée  BibTeX,EndNote,...
Algorithms, 4 (2011) 262-284
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
Rafael Carrascosa 1, François Coste 2, Matthias Gallé 2, Gabriel Infante-Lopez 1
Cooperation Inria/Mincyt Collaboration(s)
(26/10/2011)

The smallest grammar problem--namely, finding a smallest context-free grammar that generates exactly one sequence--is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.
1 :  Grupo de Procesamiento de Lenguaje Natural (PLN - FaMAF)
Universidad Nacional de Córdoba
2 :  SYMBIOSE (INRIA - IRISA)
CNRS : UMR6074 – INRIA – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1
Informatique/Apprentissage

Informatique/Théorie de l'information et codage

Mathématiques/Théorie de l'information et codage

Informatique/Algorithme et structure de données
smallest grammar problem – hierarchical structure inference – optimal parsing – data discovery
Liste des fichiers attachés à ce document :
PDF
algorithms-04-00262-v2.pdf(536.3 KB)