Good edit similarity learning by loss minimization

Aurélien Bellet; Amaury Habrard; Marc Sebban

doi:10.1007/s10994-012-5293-8

Article Dans Une Revue Machine Learning Année : 2012

Good edit similarity learning by loss minimization

(1) , (1) , (1)

Aurélien Bellet

Fonction : Auteur
PersonId : 9877
IdHAL : aurelien-bellet
ORCID : 0000-0003-3440-1251

Laboratoire Hubert Curien

Amaury Habrard

Fonction : Auteur
PersonId : 439
IdHAL : amaury-habrard
ORCID : 0000-0003-3038-9347
IdRef : 084103655

Laboratoire Hubert Curien

Marc Sebban

Fonction : Auteur
PersonId : 5203
IdHAL : marc-sebban
ORCID : 0000-0001-6851-169X
IdRef : 050802623

Laboratoire Hubert Curien

Résumé

Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, edit distancebased measures are widely used, and there exists a few methods for learning them from data. However, these methods offer no theoretical guarantee as to the generalization ability and discriminative power of the learned similarities. In this paper, we propose a loss minimization-based edit similarity learning approach, called GESL. It is driven by the notion of (e, γ, τ )-goodness, a theory that bridges the gap between the properties of a similarity function and its performance in classification. We show that our learning framework is a suitable way to deal not only with strings but also with tree-structured data. Using the notion of uniform stability, we derive generalization guarantees for a large class of loss functions. We also provide experimental results on two realworld datasets which show that edit similarities learned with GESL induce more accurate and sparser classifiers than other (standard or learned) edit similarities.

Domaines

Apprentissage [cs.LG]

Fichier principal

MLJ2012-preprint.pdf (436.06 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Marc Sebban : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00690240

Soumis le : lundi 20 août 2012-14:49:57

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : mercredi 21 novembre 2012-02:20:08

Dates et versions

hal-00690240 , version 1 (20-08-2012)

Identifiants

HAL Id : hal-00690240 , version 1
DOI : 10.1007/s10994-012-5293-8

Citer

Aurélien Bellet, Amaury Habrard, Marc Sebban. Good edit similarity learning by loss minimization. Machine Learning, 2012, 89, pp.5-35. ⟨10.1007/s10994-012-5293-8⟩. ⟨hal-00690240⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS CNRS LAHC PARISTECH UDL

155 Consultations

823 Téléchargements

Good edit similarity learning by loss minimization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager