Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Journal of Machine Learning Research Année : 2016

Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning

Amaury Habrard

Résumé

Learning probabilistic models over strings is an important issue for many applications. Spectral methods propose elegant solutions to the problem of inferring weighted automata from finite samples of variable-length strings drawn from an unknown target distribution $p$. These methods rely on a singular value decomposition of a matrix $\v{H}_S$, called the empirical Hankel matrix, that records the frequencies of (some of) the observed strings $S$. The accuracy of the learned distribution depends both on the quantity of information embedded in $\v{H}_S$ and on the distance between $\v{H}_S$ and its mean $\v{H}_p$. Existing concentration bounds seem to indicate that the concentration over $\v{H}_p$ gets looser with its dimensions, suggesting that it might be necessary to bound the dimensions of $\v{H}_S$ for learning. We prove new dimension-free concentration bounds for classical Hankel matrices and several variants, based on prefixes or factors of strings, that are useful for learning. Experiments demonstrate that these bounds are tight and that they significantly improve existing (dimension-dependent) bounds. One consequence of these results is that the spectral learning approach remains consistent even if all the observations are recorded within the empirical matrix.
Fichier principal
Vignette du fichier
14-501.pdf (540.39 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Commentaire : Link: http://jmlr.org/papers/v17/14-501.html
Loading...

Dates et versions

hal-01306915 , version 1 (25-04-2016)

Identifiants

  • HAL Id : hal-01306915 , version 1

Citer

François Denis, Mattias Gybels, Amaury Habrard. Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning. Journal of Machine Learning Research, 2016, 17 (31), pp.1-32. ⟨hal-01306915⟩
187 Consultations
63 Téléchargements

Partager

Gmail Facebook X LinkedIn More