An Experimental Study on Learning with Good Edit Similarity Functions

Aurélien Bellet; Amaury Habrard; Marc Sebban

Communication Dans Un Congrès Année : 2011

An Experimental Study on Learning with Good Edit Similarity Functions

(1) , (2) , (1)

1
2

Aurélien Bellet

Fonction : Auteur
PersonId : 9877
IdHAL : aurelien-bellet
ORCID : 0000-0003-3440-1251

Laboratoire Hubert Curien

Amaury Habrard

Fonction : Auteur
PersonId : 439
IdHAL : amaury-habrard
ORCID : 0000-0003-3038-9347
IdRef : 084103655

Laboratoire d'informatique Fondamentale de Marseille - UMR 6166

Marc Sebban

Fonction : Auteur
PersonId : 5203
IdHAL : marc-sebban
ORCID : 0000-0001-6851-169X
IdRef : 050802623

Laboratoire Hubert Curien

Résumé

Similarity functions are essential to many learning algorithms. To allow their use in support vector machines (SVM), i.e., for the convergence of the learning algorithm to be guaranteed, they must be valid kernels. In the case of structured data, the similarities based on the popular edit distance often do not satisfy this requirement, which explains why they are typically used with k-nearest neighbor (k-NN). A common approach to use such edit similarities anyway in SVM is to transform them into potentially (but not provably) valid kernels. Recently, a different theory of learning with (epsilon,gamma,tau)-good similarity functions was proposed, allowing the use of non-kernel similarity functions. Moreover, the resulting models are supposedly sparse, as opposed to standard SVM models that can be unnecessarily dense. In this paper, we study the relevance and applicability of this theory in the context of string edit similarities. We show that they are naturally good for a given string classification task and provide experimental evidence that the obtained models not only clearly outperform the k-NN approach, but are also competitive with standard SVM models learned with state-of-the-art edit kernels, while being much sparser.

Domaines

Apprentissage [cs.LG]

Marc Sebban : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00618706

Soumis le : vendredi 2 septembre 2011-15:50:08

Dernière modification le : vendredi 24 mars 2023-14:52:54

Dates et versions

hal-00618706 , version 1 (02-09-2011)

Identifiants

HAL Id : hal-00618706 , version 1

Citer

Aurélien Bellet, Amaury Habrard, Marc Sebban. An Experimental Study on Learning with Good Edit Similarity Functions. ICTAI 2011, Nov 2011, United States. ⟨hal-00618706⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS LIF CNRS UNIV-AMU LAHC PARISTECH LIS-LAB UDL

65 Consultations

0 Téléchargements

An Experimental Study on Learning with Good Edit Similarity Functions

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager