Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning

Abstract : Learning probabilistic models over strings is an important issue for many applications. Spectral methods propose elegant solutions to the problem of inferring weighted automata from finite samples of variable-length strings drawn from an unknown target distribution. These methods rely on a singular value decomposition of a matrix $H_S$, called the Hankel matrix, that records the frequencies of (some of) the observed strings. The accuracy of the learned distribution depends both on the quantity of information embedded in $H_S$ and on the distance between $H_S$ and its mean $H_r$. Existing concentration bounds seem to indicate that the concentration over $H_r$ gets looser with its size, suggesting to make a trade-off between the quantity of used information and the size of $H_r$. We propose new dimension-free concentration bounds for several variants of Hankel matrices. Experiments demonstrate that these bounds are tight and that they significantly improve existing bounds. These results suggest that the concentration rate of the Hankel matrix around its mean does not constitute an argument for limiting its size.
Document type :
Conference papers
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01009395
Contributor : Amaury Habrard <>
Submitted on : Tuesday, June 17, 2014 - 6:22:33 PM
Last modification on : Tuesday, April 2, 2019 - 1:43:31 AM
Long-term archiving on : Wednesday, September 17, 2014 - 11:41:00 AM

File

denis14.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01009395, version 1

Citation

Francois Denis, Mattias Gybels, Amaury Habrard. Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning. The International Conference on Machine Learning (ICML), Jun 2014, China. pp.JMLR: W&CP volume 32. ⟨hal-01009395⟩

Share

Metrics

Record views

306

Files downloads

57