Stochastic Subsampling for Factorizing Huge Matrices

Abstract : We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or non-negative, which makes our algorithm suitable for dictionary learning, sparse component analysis, and non-negative matrix factorization. Our algorithm streams matrix columns while subsampling them to iteratively learn the matrix factors. At each iteration, the row dimension of a new sample is reduced by subsampling, resulting in lower time complexity compared to a simple streaming algorithm. Our method comes with convergence guarantees to reach a stationary point of the matrix-factorization problem. We demonstrate its efficiency on massive functional Magnetic Resonance Imaging data (2 TB), and on patches extracted from hyperspectral images (103 GB). For both problems, which involve different penalties on rows and columns, we obtain significant speed-ups compared to state-of-the-art algorithms.
Type de document :
Article dans une revue
IEEE Transactions on Signal Processing, Institute of Electrical and Electronics Engineers, 2018, 66 (1), pp.113-128. 〈http://ieeexplore.ieee.org/document/8038072/〉. 〈10.1109/TSP.2017.2752697〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01431618
Contributeur : Arthur Mensch <>
Soumis le : lundi 30 octobre 2017 - 10:18:07
Dernière modification le : mercredi 13 décembre 2017 - 16:33:15

Fichiers

modl_tsp.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux. Stochastic Subsampling for Factorizing Huge Matrices. IEEE Transactions on Signal Processing, Institute of Electrical and Electronics Engineers, 2018, 66 (1), pp.113-128. 〈http://ieeexplore.ieee.org/document/8038072/〉. 〈10.1109/TSP.2017.2752697〉. 〈hal-01431618v3〉

Partager

Métriques

Consultations de la notice

186

Téléchargements de fichiers

49