Skip to Main content Skip to Navigation
Journal articles

Learning State Machine-based String Edit Kernels

Abstract : During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden markov model) and compares two strings according to how they are generated by M. On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing conditional probabilities. In this paper, we adapt this approach to edit distance-based conditional distributions and we present a way to learn a new string edit kernel. We show that the practical computation of such a kernel between two strings x and x' built from an alphabet \Sigma requires (i) to learn edit probabilities in the form of the parameters of a stochastic state machine and (ii) to calculate an infinite sum over \Sigma^* by resorting to the intersection of probabilistic automata as done for rational kernels. We show on a handwritten character recognition task that our new kernel outperforms not only the state of the art string kernels and string edit kernels but also the standard edit distance used by a neighborhood-based classifier.
Document type :
Journal articles
Complete list of metadata

Cited literature [31 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00462538
Contributor : Marc Sebban Connect in order to contact the contributor
Submitted on : Wednesday, January 7, 2015 - 3:19:35 PM
Last modification on : Tuesday, December 8, 2020 - 9:48:35 AM
Long-term archiving on: : Saturday, April 15, 2017 - 2:19:50 PM

File

pr10.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Aurélien Bellet, Marc Bernard, Thierry Murgue, Marc Sebban. Learning State Machine-based String Edit Kernels. Pattern Recognition, Elsevier, 2010, 43 (2010), pp.2330-2339. ⟨10.1016/j.patcog.2009.12.008⟩. ⟨hal-00462538⟩

Share

Metrics

Les métriques sont temporairement indisponibles