M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis et al., Tensorflow: A system for large-scale machine learning, Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, pp.265-283, 2016.

Y. Cho and L. K. Saul, Kernel methods for deep learning, Advances in Neural Information Processing Systems, vol.22, pp.342-350, 2009.

F. Chollet, , 2015.

D. Croce, D. Rossini, and R. Basili, Explaining nonlinear classifier decisions within kernel-based deep architectures, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp.16-24, 2018.

M. Gönen and E. Alpayd?n, Multiple kernel learning algorithms, Journal of machine learning research, vol.12, pp.2211-2268, 2011.

T. Hazan and T. Jaakkola, Steps toward deep kernel methods from infinite neural networks, 2015.

U. Heinemann, R. Livni, E. Eban, G. Elidan, and A. Globerson, Improper deep kernels, Artificial Intelligence and Statistics, pp.1159-1167, 2016.

C. Jose, P. Goyal, P. Aggrwal, and M. Varma, Local deep kernel learning for efficient non-linear svm prediction, International Conference on Machine Learning, pp.486-494, 2013.

. Fig, 2-dimensional ? nys representation of 1000 randomly selected test set samples from CIFAR-10 dataset, obtained with a subsample set of size 2 and a linear kernel (left) or Chi2 kernel (right), vol.4

A. Krizhevsky, Learning multiple layers of features from tiny images, 2009.

S. Kumar, M. Mohri, and A. Talwalkar, Sampling methods for the nyström method, Journal of Machine Learning Research, vol.13, pp.981-1006, 2012.

V. Quoc, T. Le, A. Sarlos, and . Smola, Fastfood-computing hilbert space expansions in loglinear time, International Conference on Machine Learning, 2013.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradientbased learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradientbased learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

Y. Lecun and C. Cortes, The MNIST database of handwritten digits

J. Mairal, End-to-end kernel learning with supervised convolutional kernel networks, Advances in neural information processing systems, pp.1399-1407, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01387399

J. Mairal, P. Koniusz, Z. Harchaoui, and C. Schmid, Convolutional kernel networks, Advances in neural information processing systems, pp.2627-2635, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01005489

G. Montavon, L. Mikio, K. Braun, and . Utller, Kernel analysis of deep networks, Journal of Machine Learning Research, vol.12, pp.2563-2581, 2011.

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu et al., Reading digits in natural images with unsupervised feature learning, 2011.

G. Pandey and A. Dukkipati, Learning by stretching deep networks, Proceedings of the 31st International Conference on Machine Learning, vol.32, pp.22-24, 2014.

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Neural Infomration Processing Systems, 2007.

A. Rahimi and B. Recht, Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning, Advances in Neural Information Processing Systems 21, pp.1313-1320, 2009.

B. Scholkopf, S. Mika, J. C. Chris, P. Burges, K. Knirsch et al., Input space versus feature space in kernel-based methods, IEEE transactions on neural networks, vol.10, issue.5, pp.1000-1017, 1999.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

H. Song, J. Jayaraman, P. Thiagarajan, A. Sattigeri, and . Spanias, Optimizing kernel machines using deep learning, IEEE Transactions on Neural Networks and Learning Systems, 2018.

I. Steinwart, P. Thomann, and N. Schmid, Learning with hierarchical gaussian kernels, 2016.

S. Sun, J. Zhao, and J. Zhu, A review of nyström methods for large-scale machine learning. Information Fusion, vol.26, pp.36-48, 2015.

A. Vedaldi and A. Zisserman, Efficient additive kernels via explicit feature maps, IEEE Trans. Pattern Anal. Mach. Intell, vol.34, issue.3, pp.480-492, 2012.

C. Williams and M. Seeger, Using the nyström method to speed up kernel machines, Advances in Neural Information Processing Systems 13, pp.682-688, 2001.

Z. Andrew-gordon-wilson, R. Hu, E. P. Salakhutdinov, and . Xing, Deep kernel learning, Artificial Intelligence and Statistics, pp.370-378, 2016.

Z. Yang, M. Moczulski, M. Denil, N. D. Freitas, A. Smola et al., Deep fried convnets, 2015 IEEE International Conference on Computer Vision (ICCV), vol.00, pp.1476-1483, 2015.

S. Zhang, J. Li, P. Xie, Y. Zhang, M. Shao et al., Stacked kernel network, 2017.