Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 1999

Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays

Résumé

We present a very efficient, in terms of space and access speed, data structure for storing huge natural language data sets. The structure is described as LZ (Ziv Lempel) compressed linked list trie and is a step further beyond directed acyclic word graph in automata compression. We are using the structure to store DELAF, a huge French lexicon with syntactical, grammatical and lexical information associated with each word. The compressed structure can be produced in O(N) time using suffix trees for finding repetitions in trie, but for large data sets space requirements are more prohibitive than time so suffix arrays are used instead, with compression time complexity O(N log N) for all but for the largest data sets.

Mots clés

Fichier principal
Vignette du fichier
cpm99.pdf (170.56 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00189726 , version 1 (22-11-2007)

Identifiants

  • HAL Id : hal-00189726 , version 1

Citer

Strahil Ristov, Eric Laporte. Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays. Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays, 1999, Warwick, United Kingdom. pp.196-211. ⟨hal-00189726⟩
137 Consultations
239 Téléchargements

Partager

Gmail Facebook X LinkedIn More