M. Aharon, M. Elad, and A. M. Bruckstein, <tex>$rm K$</tex>-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation, IEEE Transactions on Signal Processing, vol.54, issue.11, pp.4311-4322, 2006.
DOI : 10.1109/TSP.2006.881199

R. K. Ahuja, T. L. Magnanti, and J. Orlin, Network Flows, 1993.

F. Bach, Exploring large feature spaces with hierarchical multiple kernel learning, Advances in Neural Information Processing Systems, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00319660

F. Bach, Structured sparsity-inducing norms through submodular functions, Advances in Neural Information Processing Systems, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00511310

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Convex optimization with sparsity-inducing norms. In Optimization for Machine Learning, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00937150

R. Baraniuk, Optimal tree approximation with wavelets. Wavelet Applications in Signal and Image Processing VII, p.206214, 1999.

R. G. Baraniuk, R. A. Devore, G. Kyriazis, and X. M. Yu, Near best tree approximation, Advances in Computational Mathematics, vol.16, issue.4, pp.357-373, 2002.
DOI : 10.1023/A:1014554317692

R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, Model-Based Compressive Sensing, IEEE Transactions on Information Theory, vol.56, issue.4, pp.1982-2001, 2010.
DOI : 10.1109/TIT.2010.2040894

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

S. Becker, J. Bobin, and E. Candes, NESTA: A Fast and Accurate First-Order Method for Sparse Recovery, SIAM Journal on Imaging Sciences, vol.4, issue.1, 2009.
DOI : 10.1137/090756855

Y. Bengio, Learning Deep Architectures for AI, Machine Learning, 2009.
DOI : 10.1561/2200000006

D. P. Bertsekas, Nonlinear programming, Athena Scientific, 1999.

P. Bickel, Y. Ritov, and A. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, The Annals of Statistics, vol.37, issue.4, pp.1705-1732, 2009.
DOI : 10.1214/08-AOS620

URL : https://hal.archives-ouvertes.fr/hal-00401585

D. Blei and J. Mcauliffe, Supervised topic models, Advances in Neural Information Processing Systems, 2008.

D. Blei, A. Ng, and M. Jordan, Latent dirichlet allocation, Journal of Machine Learning Research, vol.3, pp.993-1022, 2003.

D. Blei, T. L. Griffiths, and M. I. Jordan, The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies, Journal of the ACM, vol.57, issue.2, pp.1-30, 2010.
DOI : 10.1145/1667053.1667056

J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples, 2006.

S. P. Boyd and L. Vandenberghe, Convex Optimization, 2004.

D. M. Bradley and J. A. , Differentiable sparse coding, Advances in Neural Information Processing Systems, 2009.

P. Brucker, An O(n) algorithm for quadratic knapsack problems, Operations Research Letters, vol.3, issue.3, pp.163-166, 1984.
DOI : 10.1016/0167-6377(84)90010-5

W. L. Buntine, Variational Extensions to EM and Multinomial PCA, Proceedings of the European Conference on Machine Learning (ECML), 2002.
DOI : 10.1007/3-540-36755-1_3

S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic Decomposition by Basis Pursuit, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.33-61, 1998.
DOI : 10.1137/S1064827596304010

X. Chen, Q. Lin, S. Kim, J. Peña, J. G. Carbonell et al., An efficient proximal-gradient method for single and multi-task regression with structured sparsity, 2010.

R. R. Coifman and D. L. Donoho, Translation-invariant de-noising. Lectures notes in statistics, pp.125-125, 1995.

P. L. Combettes and J. Pesquet, Proximal Splitting Methods in Signal Processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, 2010.
DOI : 10.1007/978-1-4419-9569-8_10

URL : https://hal.archives-ouvertes.fr/hal-00643807

M. Crouse, R. D. Nowak, and R. G. Baraniuk, Wavelet-based statistical signal processing using hidden Markov models, IEEE Transactions on Signal Processing, vol.46, issue.4, pp.886-902, 1998.
DOI : 10.1109/78.668544

D. L. Donoho, CART and best-ortho-basis: a connection, The Annals of Statistics, vol.25, issue.5, pp.1870-1911, 1997.
DOI : 10.1214/aos/1069362377

D. L. Donoho and I. M. Johnstone, Adapting to Unknown Smoothness via Wavelet Shrinkage, Journal of the American Statistical Association, vol.31, issue.432, p.90, 1995.
DOI : 10.1080/01621459.1979.10481038

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, vol.10, pp.2899-2934, 2009.

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression, Annals of Statistics, vol.32, issue.2, pp.407-451, 2004.

M. Elad and M. Aharon, Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries, IEEE Transactions on Image Processing, vol.15, issue.12, pp.3736-3745, 2006.
DOI : 10.1109/TIP.2006.881969

J. Friedman, T. Hastie, and R. Tibshirani, A note on the group lasso and a sparse group lasso, 2010.

T. L. Griffiths and M. Steyvers, Finding scientific topics, Proceedings of the National Academy of Sciences, p.5228, 2004.
DOI : 10.1073/pnas.0307752101

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2009.

L. He and L. Carin, Exploiting structure in wavelet-based Bayesian compressive sensing, IEEE Transactions on Signal Processing, vol.57, pp.3488-3497, 2009.

C. Hu, J. T. Kwok, and W. Pan, Accelerated gradient methods for stochastic optimization and online learning, Advances in Neural Information Processing Systems, 2009.

J. Huang, T. Zhang, and D. Metaxas, Learning with structured sparsity, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553429

L. Jacob, G. Obozinski, and J. Vert, Group lasso with overlap and graph lasso, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553431

R. Jenatton, J. Audibert, and F. Bach, Structured variable selection with sparsity-inducing norms, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00377732

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, Proximal methods for sparse hierarchical dictionary learning, Proceedings of the International Conference on Machine Learning (ICML), 2010.

S. C. Johnson, Hierarchical clustering schemes, Psychometrika, vol.58, issue.4, pp.241-254, 1967.
DOI : 10.1007/BF02289588

S. Kim and E. P. Xing, Tree-guided group Lasso for multi-task regression with structured sparsity, Proceedings of the International Conference on Machine Learning (ICML), 2010.

F. Lacoste-julien, M. I. Sha, and . Jordan, DiscLDA: Discriminative learning for dimensionality reduction and classification, Advances in Neural Information Processing Systems, 2008.

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, pp.788-791, 1999.

H. Lee, A. Battle, R. Raina, and A. Y. Ng, Efficient sparse coding algorithms, Advances in Neural Information Processing Systems, 2007.

N. Maculan and J. R. Galdino-de-paula, A linear-time median-finding algorithm for projecting a vector on the simplex of n, Operations Research Letters, vol.8, issue.4, pp.219-222, 1989.
DOI : 10.1016/0167-6377(89)90064-3

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Supervised dictionary learning, Advances in Neural Information Processing Systems, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00322431

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Non-local sparse models for image restoration, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459452

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, Journal of Machine Learning Research, vol.11, issue.1, pp.19-60, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00408716

J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, Network flow algorithms for structured sparsity, Advances in Neural Information Processing Systems, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00512556

S. G. Mallat, A wavelet tour of signal processing, 1999.

D. Martin, C. Fowlkes, D. Tal, and J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2001.
DOI : 10.1109/ICCV.2001.937655

C. A. Micchelli, J. M. Morales, and M. Pontil, A family of penalty functions for structured sparsity, Advances in Neural Information Processing Systems, 2010.

J. J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, C. R. Acad. Sci. Paris Sér. A Math, vol.255, pp.2897-2899, 1962.

D. Needell and J. A. Tropp, CoSaMP, Communications of the ACM, vol.53, issue.12, pp.301-321, 2009.
DOI : 10.1145/1859204.1859229

Y. Nesterov, Gradient methods for minimizing composite objective function, Center for Operations Research and Econometrics (CORE), 2007.

B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1?, Vision Research, vol.37, issue.23, pp.3311-3325, 1997.
DOI : 10.1016/S0042-6989(97)00169-7

M. Schmidt and K. Murphy, Convex structure learning in log-linear models: Beyond pairwise potentials, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.

J. M. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Transactions on Signal Processing, vol.41, issue.12, pp.3445-3462, 1993.
DOI : 10.1109/78.258085

P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar, Collaborative hierarchical sparse modeling, 2010 44th Annual Conference on Information Sciences and Systems (CISS), 2010.
DOI : 10.1109/CISS.2010.5464845

G. W. Stewart and J. Sun, Matrix Perturbation Theory (Computer Science and Scientific Computing), 1990.

M. Stojnic, F. Parvaresh, and B. Hassibi, On the Reconstruction of Block-Sparse Signals With an Optimal Number of Measurements, IEEE Transactions on Signal Processing, vol.57, issue.8, pp.3075-3085, 2009.
DOI : 10.1109/TSP.2009.2020754

R. Tibshirani, Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society. Series B, pp.267-288, 1996.

J. A. Tropp, Greed is Good: Algorithmic Results for Sparse Approximation, IEEE Transactions on Information Theory, vol.50, issue.10, pp.2231-2242, 2004.
DOI : 10.1109/TIT.2004.834793

J. A. Tropp, Just relax: convex programming methods for identifying sparse signals in noise, IEEE Transactions on Information Theory, vol.52, issue.3, 2006.
DOI : 10.1109/TIT.2005.864420

M. J. Wainwright, Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using <formula formulatype="inline"><tex Notation="TeX">$\ell _{1}$</tex> </formula>-Constrained Quadratic Programming (Lasso), IEEE Transactions on Information Theory, vol.55, issue.5, pp.2183-2202, 2009.
DOI : 10.1109/TIT.2009.2016018

S. J. Wright, R. D. Nowak, and M. A. Figueiredo, Sparse Reconstruction by Separable Approximation, IEEE Transactions on Signal Processing, vol.57, issue.7, pp.2479-2493, 2009.
DOI : 10.1109/TSP.2009.2016892

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research, vol.11, pp.2543-2596, 2010.

K. Yu, T. Zhang, and Y. Gong, Nonlinear learning using local coordinate coding, Advances in Neural Information Processing Systems, 2009.

G. X. Yuan, K. W. Chang, C. J. Hsieh, and C. J. Lin, Comparison of optimization methods and software for large-scale l1-regularized linear classification, Journal of Machine Learning Research, vol.11, pp.3183-3234, 2010.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006.
DOI : 10.1198/016214502753479356

P. Zhao, G. Rocha, and B. Yu, The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, vol.37, issue.6A, pp.3468-3497, 2009.
DOI : 10.1214/07-AOS584

J. Zhu, A. Ahmed, and E. P. Xing, MedLDA, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553535

D. Zoran and Y. Weiss, The " tree-dependent components " of natural scenes are edge filters, Advances in Neural Information Processing Systems, 2009.