J. Ba, V. Mnih, and K. Kavukcuoglu, Multiple object recognition with visual attention, 2014.

D. Benbouzid, R. Busa-fekete, and B. Kégl, Fast classification using sparse decision dags, ICML, 2012.
URL : https://hal.archives-ouvertes.fr/in2p3-00711150

J. Bi, K. Bennett, M. Embrechts, C. Breneman, and M. Song, Dimensionality reduction via sparse support vector machines, JMLR, vol.3, pp.1229-1243, 2003.

M. Bilgic and L. Getoor, Voila: Efficient feature-value acquisition for classification, Proceedings of the national conference on artificial intelligence, 2007.

X. Chai, L. Deng, Q. Yang, and C. X. Ling, Test-cost sensitive naive bayes classification, Data Mining,ICDM'04, 2004.

O. Chapelle, P. Shivaswamy, S. Vadrevu, K. Weinberger, Y. Zhang et al., Boosted multi-task learning, Machine learning, vol.85, issue.1-2, pp.149-173, 2011.
DOI : 10.1007/s10994-010-5231-6

M. Chen, K. Q. Weinberger, O. Chapelle, D. Kedem, and Z. Xu, Classifier cascade for minimizing feature evaluation cost, AISTATS. pp, pp.218-226, 2012.

K. Cho, B. Van-merriënboer, D. Bahdanau, and Y. Bengio, On the properties of neural machine translation, 2014.
DOI : 10.3115/v1/w14-4012
URL : https://doi.org/10.3115/v1/w14-4012

G. Dulac-arnold, L. Denoyer, P. Preux, and P. Gallinari, Sequential approaches for learning datum-wise sparse representations, Machine learning, 2012.
DOI : 10.1007/s10994-012-5306-7
URL : https://hal.archives-ouvertes.fr/hal-00747724

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, JMLR, 2003.

S. Ji and L. Carin, Cost-sensitive feature acquisition and classification, Pattern Recognition, vol.40, issue.5, pp.1474-1485, 2007.
DOI : 10.1016/j.patcog.2006.11.008

R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial intelligence, vol.97, issue.1, pp.273-324, 1997.
DOI : 10.1016/s0004-3702(97)00043-x
URL : https://doi.org/10.1016/s0004-3702(97)00043-x

V. Mnih, N. Heess, and A. Graves, Recurrent models of visual attention, 2014.

P. Sermanet, A. Frome, and E. Real, Attention for fine-grained categorization, 2014.

K. Trapeznikov and V. Saligrama, Supervised sequential classification under budget constraints, 2013.

P. Viola and M. Jones, Robust real-time object detection, International Journal of Computer Vision, vol.4, pp.51-52, 2001.

K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg, Feature hashing for large scale multitask learning, 2009.
DOI : 10.1145/1553374.1553516
URL : http://arxiv.org/pdf/0902.2206

D. J. Weiss and B. Taskar, Learning adaptive value of information for structured prediction, p.NIPS, 2013.

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio et al., Feature selection for svms, 2000.

Z. Xu, G. Huang, K. Q. Weinberger, and A. X. Zheng, Gradient boosted feature selection, 2014.
DOI : 10.1145/2623330.2623635
URL : http://arxiv.org/pdf/1901.04055

Z. Xu, M. J. Kusner, K. Q. Weinberger, M. Chen, and O. Chapelle, Classifier cascades and trees for minimizing feature evaluation cost, 2014.

Z. Xu, K. Weinberger, and O. Chapelle, The greedy miser: Learning under test-time budgets, 2012.

M. Yuan and Y. Lin, Efficient empirical Bayes variable selection and estimation in linear models, Journal of the American Statistical Association, 2005.

Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen et al., A general boosting method and its application to learning ranking functions for web search, 2008.