A new algorithm for Automatic Word Classification based on an Improved Simulated Annealing Technique

Kamel Smaïli 1 François Charpillet 2 Jean-Paul Haton 3
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
3 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this work, we present a new method for clustering words into equivalence classes within a bigram word (which is extensible to a trigram word). This method is based upon an improved approach of the simulated annealing technique. The main idea consists in defining a transition function instead of a random transition function as usually used in classical approaches. Our transition function specifies the way the classes may evolve during the simulating annealing process. It is based on a set of stochastic rules based on linguistic criteria which select the set of words that can be moved from one class to another. This method allows to palliate the main drawbacks of the classical approach: time complexity and random transition function. In this paper, we compare the performances of both approaches: the classical simulated annealing and the proposed one and we report the results of an evaluation on two French databases. The classical approach is 2,5 times slower than our approach and despite of a low value of perplexity the classification obtained is not as good as the one given by our approach.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01112891
Contributor : Kamel Smaïli <>
Submitted on : Wednesday, February 4, 2015 - 11:07:22 AM
Last modification on : Tuesday, December 18, 2018 - 4:40:21 PM
Long-term archiving on: Thursday, September 10, 2015 - 4:00:23 PM

File

CSNLP96.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01112891, version 1

Citation

Kamel Smaïli, François Charpillet, Jean-Paul Haton. A new algorithm for Automatic Word Classification based on an Improved Simulated Annealing Technique. The 5th International Conference on the Cognitive Science of Natural Language Processing, 1996, Dublin, Ireland. ⟨hal-01112891⟩

Share

Metrics

Record views

413

Files downloads

108