. Aronszajn, Theory of reproducing kernels. Transactions of the, pp.337-404, 1950.

L. Baringhaus and C. Franz, On a new multivariate two-sample test, Journal of Multivariate Analysis, vol.88, issue.1, pp.190-206, 2004.
DOI : 10.1016/S0047-259X(03)00079-4

URL : https://doi.org/10.1016/s0047-259x(03)00079-4

G. Blanchard, G. Lee, and C. Scott, Generalizing from several related classification tasks to a new unlabeled sample, Advances in Neural Information Processing Systems (NIPS), pp.2178-2186, 2011.

G. Blanchard, A. A. Deshmukh, U. Dogan, G. Lee, and C. Scott, Domain generalization by marginal transfer learning, 2017.

K. Borgwardt, A. Gretton, M. J. Rasch, H. Kriegel, B. Schölkopf et al., Integrating structured biological data by Kernel Maximum Mean Discrepancy, Bioinformatics, vol.22, issue.14, pp.49-57, 2006.
DOI : 10.1093/bioinformatics/btl242

URL : https://academic.oup.com/bioinformatics/article-pdf/22/14/e49/616383/btl242.pdf

J. Cardoso, Multidimensional independent component analysis, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.1941-1944, 1998.
DOI : 10.1109/ICASSP.1998.681443

URL : http://www.tsi.enst.fr/~cardoso/Papers.PDF/icassp98.pdf

C. Carmeli, E. D. Vito, A. Toigo, and V. Umanitá, VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES AND UNIVERSALITY, Analysis and Applications, vol.2, issue.01, pp.19-61, 2010.
DOI : 10.1007/s00041-007-9003-z

R. M. Dudley, Real Analysis and Probability, 2004.
DOI : 10.1017/CBO9780511755347

K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf, Kernel measures of conditional dependence, Advances in Neural Information Processing Systems (NIPS), pp.498-496, 2008.

K. Fukumizu, F. Bach, and M. Jordan, Kernel dimension reduction in regression. The Annals of Statistics, pp.1871-1905, 2009.
DOI : 10.1214/08-aos637

URL : http://doi.org/10.1214/08-aos637

K. Fukumizu, L. Song, and A. Gretton, Kernel Bayes' rule: Bayesian inference with positive definite kernels, Journal of Machine Learning Research, vol.14, pp.3753-3783, 2013.

A. Gretton, A simpler condition for consistency of a kernel independence test, 2015.

A. Gretton, O. Bousquet, A. Smola, and B. Schölkopf, Measuring Statistical Dependence with Hilbert-Schmidt Norms, Algorithmic Learning Theory (ALT), pp.63-78, 2005.
DOI : 10.1007/11564089_7

URL : http://eprints.pascal-network.org/archive/00001706/01/pdf3437.pdf

A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf et al., A kernel statistical test of independence, Advances in Neural Information Processing Systems (NIPS), pp.585-592, 2008.

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, A kernel two-sample test, Journal of Machine Learning Research, vol.13, pp.723-773, 2012.

Z. Harchaoui, F. Bach, and E. Moulines, Testing for homogeneity with kernel Fisher discriminant analysis, Advances in Neural Information Processing Systems (NIPS), pp.609-616, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00270806

W. Jitkrittum, Z. Szabó, and A. Gretton, An adaptive test of independence with analytic kernel embeddings, International Conference on Machine Learning (ICML; PMLR), pp.1742-1751, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01385111

B. Kim, R. Khanna, and O. O. Koyejo, Examples are not enough, learn to criticize! criticism for interpretability, Advances in Neural Information Processing Systems (NIPS), pp.2280-2288, 2016.

L. Klebanov, N-Distances and Their Applications, 2005.

G. Kusano, K. Fukumizu, and Y. Hiraoka, Persistence weighted Gaussian kernel for topological data analysis, International Conference on Machine Learning (ICML), pp.2004-2013, 2016.

H. C. Law, D. J. Sutherland, D. Sejdinovic, and S. Flaxman, Bayesian approaches to distribution regression, International Conference on Artificial Intelligence and Statistics (AISTATS; PMLR), pp.1167-1176, 2018.

J. R. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani, Automatic construction and natural-language description of nonparametric regression models, AAAI Conference on Artificial Intelligence, pp.1242-1250, 2014.

R. Lyons, Distance covariance in metric spaces. The Annals of Probability, pp.3284-3305, 2013.
DOI : 10.1214/12-aop803

URL : http://doi.org/10.1214/12-aop803

C. A. Micchelli, Y. Xu, and H. Zhang, Universal kernels, Journal of Machine Learning Research, vol.7, pp.2651-2667, 2006.

J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf, Distinguishing cause from effect using observational data: Methods and benchmarks, Journal of Machine Learning Research, vol.17, pp.1-102, 2016.

K. Muandet, K. Fukumizu, F. Dinuzzo, and B. Schölkopf, Learning from distributions via support measure machines, Advances in Neural Information Processing Systems (NIPS), pp.10-18, 2011.

K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf, Kernel Mean Embedding of Distributions: A Review and Beyond, Machine Learning, pp.1-141, 2017.
DOI : 10.1561/2200000060

URL : http://arxiv.org/pdf/1605.09522

M. Park, W. Jitkrittum, and D. Sejdinovic, K2-ABC: Approximate Bayesian computation with kernel embeddings, International Conference on Artificial Intelligence and Statistics (AISTATS; PMLR), pp.51398-407, 2016.

N. Pfister, P. Bühlmann, B. Schölkopf, and J. Peters, Kernel-based tests for joint independence, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.27, issue.1, pp.5-31, 2017.
DOI : 10.1214/12-AOS1041

URL : http://onlinelibrary.wiley.com/doi/10.1111/rssb.12235/pdf

N. Quadrianto, L. Song, and A. Smola, Kernelized Sorting, Advances in Neural Information Processing Systems (NIPS), pp.1289-1296, 2009.
DOI : 10.1109/TPAMI.2009.184

B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization , Optimization, and Beyond, 2002.

B. Schölkopf, K. Muandet, K. Fukumizu, S. Harmeling, and J. Peters, Computing functions of random variables via reproducing kernel Hilbert space representations, Statistics and Computing, vol.2, issue.3, pp.755-766, 2015.
DOI : 10.1137/0114046

D. Sejdinovic, A. Gretton, and W. Bergsma, A kernel test for three-variable interactions, Advances in Neural Information Processing Systems (NIPS), pp.1124-1132, 2013.

D. Sejdinovic, B. K. Sriperumbudur, A. Gretton, and K. Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing, The Annals of Statistics, vol.41, issue.5, pp.2263-2291, 2013.
DOI : 10.1214/13-AOS1140

URL : http://doi.org/10.1214/13-aos1140

C. Simon-gabriel and B. Schölkopf, Kernel distribution embeddings: Universal kernels, characteristic kernels and kernel metrics on distributions, Max Planck Institute for Intelligent Systems, 2016.

A. Smola, A. Gretton, L. Song, and B. Schölkopf, A Hilbert space embedding for distributions, Algorithmic Learning Theory (ALT), pp.13-31, 2007.
DOI : 10.1007/978-3-540-75225-7_5

URL : http://www.kyb.tuebingen.mpg.de/publications/attachments/ALT-2007-Gretton_%5B0%5D.pdf

L. Song, A. Gretton, D. Bickson, Y. Low, and C. Guestrin, Kernel belief propagation, International Conference on Artificial Intelligence and Statistics (AISTATS), pp.707-715, 2011.

L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, Feature selection via dependence maximization, Journal of Machine Learning Research, vol.13, pp.1393-1434, 2012.

B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. Lanckriet, Hilbert space embeddings and metrics on probability measures, Journal of Machine Learning Research, vol.11, pp.1517-1561, 2010.

B. K. Sriperumbudur, K. Fukumizu, and G. R. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, Journal of Machine Learning Research, vol.12, pp.2389-2410, 2011.

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, Journal of Machine Learning Research, vol.6, issue.3, pp.67-93, 2001.

I. Steinwart and A. Christmann, Support Vector Machines, 2008.

E. V. Strobl, S. Visweswaran, and K. Zhang, Approximate kernel-based conditional independence tests for fast non-parametric causal discovery, 2017.

Z. Szabó, B. Sriperumbudur, B. Póczos, and A. Gretton, Learning theory for distribution regression, Journal of Machine Learning Research, vol.17, issue.152, pp.1-40, 2016.

G. J. Székely and M. L. Rizzo, Testing for equal distributions in high dimension, InterStat, vol.5, 2004.

G. J. Székely and M. L. Rizzo, A new test for multivariate normality, Journal of Multivariate Analysis, vol.93, issue.1, pp.58-80, 2005.
DOI : 10.1016/j.jmva.2003.12.002

G. J. Székely and M. L. Rizzo, Brownian distance covariance, The Annals of Applied Statistics, vol.3, issue.4, pp.1236-1265, 2009.
DOI : 10.1214/09-AOAS312

G. J. Székely, M. L. Rizzo, and N. K. Bakirov, Measuring and testing dependence by correlation of distances. The Annals of Statistics, pp.2769-2794, 2007.

H. Wendland, Scattered Data Approximation, Cambridge Monographs on Applied and Computational Mathematics, 2005.
DOI : 10.1017/CBO9780511617539

M. Yamada, Y. Umezu, K. Fukumizu, and I. Takeuchi, Post selection inference with kernels, International Conference on Artificial Intelligence and Statistics (AISTATS; PMLR), pp.152-160, 2018.

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. R. Salakhutdinov et al., Deep sets, Advances in Neural Information Processing Systems (NIPS), pp.3394-3404, 2017.

K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang, Domain adaptation under target and conditional shift, Journal of Machine Learning Research, vol.28, issue.3, pp.819-827, 2013.

A. A. Zinger, A. V. Kakosyan, and L. B. Klebanov, A characterization of distributions by mean values of statistics and certain probabilistic metrics, Journal of Soviet Mathematics, vol.200, issue.No. 4, 1992.
DOI : 10.1007/BF01099119