Online clustering of individual sequences

Abstract : We know that $\ell_0$-penalized methods have good theoretical properties but unfortunately high computational cost. On the contrary, convex relaxations - such as the Lasso - have been introduced but their theoretical guarantees hold for restricted models. To tackle this impasse, \cite{dalalyantsybakov2,dalalyantsybakov3} come up with sparsity priors in a Pac-Bayesian framework. They give rise to good theoretical properties, i.e. sparsity oracle inequalities, reached by computationally attractive sequential procedures. In this paper, we investigate this issue in clustering. We construct online \textit{clustering} algorithms which learn according to the following game protocol. At each trial $t\geq 1$, nature reveals a deterministic $x_t\in\R^d$, $d\geq 1$. A forecaster predicts the next value with several - and as small as possible - proposals. Then, nature reveals the next value and the forecaster pays the minimal distance between this value and its set of proposals. To deal with this problem, we use the Pac-Bayesian theory with group-sparsity priors. It gives sparsity regret bounds and allows us to perform online clustering of a possible non-stationnary process, without any knowledge about the number of clusters. These results can be applied to the classical i.i.d. case to deal with the problem of model selection clustering as well as high dimensional clustering.
Type de document :
Pré-publication, Document de travail
Liste complète des métadonnées
Contributeur : Sébastien Loustau <>
Soumis le : vendredi 7 février 2014 - 14:47:40
Dernière modification le : lundi 5 février 2018 - 15:00:03
Document(s) archivé(s) le : lundi 12 mai 2014 - 12:05:30


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-00943384, version 1



Sébastien Loustau. Online clustering of individual sequences. 2014. 〈hal-00943384〉



Consultations de la notice


Téléchargements de fichiers