Good edit similarity learning by loss minimization - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Machine Learning Année : 2012

Good edit similarity learning by loss minimization

Aurélien Bellet
Amaury Habrard
Marc Sebban

Résumé

Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, edit distancebased measures are widely used, and there exists a few methods for learning them from data. However, these methods offer no theoretical guarantee as to the generalization ability and discriminative power of the learned similarities. In this paper, we propose a loss minimization-based edit similarity learning approach, called GESL. It is driven by the notion of (e, γ, τ )-goodness, a theory that bridges the gap between the properties of a similarity function and its performance in classification. We show that our learning framework is a suitable way to deal not only with strings but also with tree-structured data. Using the notion of uniform stability, we derive generalization guarantees for a large class of loss functions. We also provide experimental results on two realworld datasets which show that edit similarities learned with GESL induce more accurate and sparser classifiers than other (standard or learned) edit similarities.
Fichier principal
Vignette du fichier
MLJ2012-preprint.pdf (436.06 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00690240 , version 1 (20-08-2012)

Identifiants

Citer

Aurélien Bellet, Amaury Habrard, Marc Sebban. Good edit similarity learning by loss minimization. Machine Learning, 2012, 89, pp.5-35. ⟨10.1007/s10994-012-5293-8⟩. ⟨hal-00690240⟩
155 Consultations
823 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More