A. Azzalini and A. Dalla-valle, the multivariate skew-normal distribution, Biometrika, vol.83, pp.715-726, 1996.

P. A. Bromiley, Products and convolutions of Gaussian probability density functions, 2014.

F. Chamroukhi, Robust mixture of experts modeling using the t distribution, Neural Networks, vol.79, pp.20-36, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01479309

F. Chamroukhi, Skew t mixture of experts, Neurocomputing, vol.266, pp.390-408, 2017.

F. Chamroukhi, S. Mohammed, D. Trabelsi, L. Oukhellou, and Y. Amirat, Joint segmentation of multivariate time series with hidden process regression for human activity recognition, Neurocomputing, vol.120, pp.633-644, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00864854

K. Chen, L. Xu, and H. Chi, Improved learning algorithms for mixture of experts in multiclass classification, Neural Networks, vol.12, pp.1229-1252, 1999.

W. Cheney and W. Light, A Course in Approximation Theory, 2000.

J. Chiou, Y. Chen, and Y. Yang, Multivariate functional principal component analysis: a normalization approach, Statistica Sinica, vol.24, pp.1571-1596, 2014.

A. Dasgupta, Asymptotic Theory Of Statistics And Probability, 2008.

A. Deleforge, F. Forbes, and R. Horaud, Acoustic space learning for sound-source separation and localization on binaural manifolds, International Journal of Neural Systems, vol.25, p.1440003, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00960796

A. Deleforge, F. Forbes, and R. Horaud, High-dimensional regression with Gaussian mixtures and partially-latent response variables, Statistics and Computing, vol.25, pp.893-911, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01107604

H. Fu, M. Gong, C. Wang, and D. Tao, MoE-SPNet: a mixture of experts scene parsing network, Pattern Recognition, vol.84, pp.226-236, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02086416

J. Geweke and M. Keane, Smoothly mixing regressions, Journal of Econometrics, vol.138, pp.252-290, 2007.

B. Grun, I. Kosmidis, and A. Zeileis, Extended beta regression in R: shaken, stirred, mixed, and partitioned, Journal of Statistical Software, vol.48, pp.1-25, 2012.

B. Grun and F. Leisch, Flexmix version 2: finite mixtures with concomitant variables and varying and constant parameters, Journal of Statistical Software, vol.28, pp.1-35, 2008.

S. Ingrassia, S. C. Minotti, and A. Punzo, Model-based clustering via linear cluster-weighted models, Computational Statistics and Data Analysis, vol.71, pp.159-182, 2014.

S. Ingrassia, S. C. Minotti, and G. Vittadini, Local statistical modeling via a clusterweighted approach with elliptical distributions, Journal of Classification, vol.29, pp.363-401, 2012.

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, Adaptive mixtures of local experts, Neural Computation, vol.3, pp.79-87, 1991.

W. Jiang and M. A. Tanner, Hierachical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation, Annals of Statistics, vol.27, pp.987-1011, 1999.

W. Jiang and M. A. Tanner, On the approximation rate of hierachical mixtures-of-experts for generalized linear models, Neural Computation, vol.11, pp.1183-1198, 1999.

M. I. Jordan and R. A. Jacobs, Hierachical mixtures of experts and the EM algorithm, Neural Computation, vol.6, pp.181-214, 1994.

M. I. Jordan and L. Xu, Convergence results for the EM approach to mixtures of experts architectures, Neural Networks, vol.8, pp.1409-1431, 1995.

L. Kalliovirta, M. Meitz, and P. Saikkonen, Gaussian mixture vector autoregression, Journal of Econometrics, vol.192, pp.485-498, 2016.

S. Masoudnia and R. Ebrahimpour, Mixture of experts: a literature survey, Artificial Intelligence Review, vol.42, pp.275-293, 2014.

E. F. Mendes and W. Jiang, On convergence rates of mixture of polynomial experts, Neural Computation, vol.24, pp.3025-3051, 2012.

L. Montuelle and E. Le-pennec, Mixture of Gaussian regressions model with logistic weights, a penalized maximum likelihood approach, Electronic Journal of Statistics, vol.8, pp.1661-1695, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01101483

H. D. Nguyen and F. Chamroukhi, Practical and theoretical aspects of mixture-of-experts modeling: an overview. WIREs Data Mining and Knowledge Discovery, p.1246, 2018.

H. D. Nguyen, L. R. Lloyd-jones, and G. J. Mclachlan, A universal approximation theorem for mixture-of-experts models, Neural Computation, vol.28, pp.2585-2593, 2016.

H. D. Nguyen and G. J. Mclachlan, Laplace mixture of linear experts, Computational Statistics and Data Analysis, vol.93, pp.177-191, 2016.

A. Norets, Approximation of conditional densities by smooth mixtures of regressions, Annals of Statistics, vol.38, pp.1733-1766, 2010.

A. Norets and D. Pati, , 2017.

, Adaptive Bayesian estimation of conditional densities, Econometric Theory, vol.33, pp.980-1012

A. Norets and J. Pelenis, Bayesian modeling of joint and conditional distributions, Journal of Econometrics, vol.168, pp.332-346, 2012.

A. Norets and J. Pelenis, Posterior consistency in conditional density estimation by covariate dependent mixtures, Econometric Theory, vol.30, pp.606-646, 2014.

J. T. Oden and L. F. Demkowicz, Applied Functional Analysis, 2010.

J. Pelenis, Bayesian regression with heteroscedastic error density and parametric mean function, Journal of Econometrics, vol.178, pp.624-638, 2014.

E. Perthame, F. Forbes, and A. Deleforge, Inverse regression approach to robust nonlinear high-to-low dimensional mapping, Journal of Multivariate Analysis, vol.163, pp.1-14, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01347455

A. Pinkus, Ridge Functions, 2015.

D. Pollard, A User's Guide to Measure Theoretic Probability, 2002.

R. Prado, F. Molina, and G. Huerta, Multivariate time series modeling and classification via hierachical VAR mixture, Computational Statistics and Data Analysis, vol.51, pp.1445-1462, 2006.

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le et al., , 2017.

, Outrageously large neural networks: the sparsely-gated mixture-of-experts layer, Proceedings of the International Conference on Learning Representation

C. Smorynski, Logical Number Theory I: An Introduction, 1991.

M. H. Stone, The generalized Weierstrass approximation theorem. Mathematical Magazine, 21, 237-254. onal least-squares learning, IEEE Transactions on Neural Networks, vol.3, pp.807-814, 1948.

L. Xu, M. I. Jordan, and G. E. Hinton, An alternative model for mixtures of experts, Advances in Neural Information Processing Systems, pp.633-640, 1995.

S. E. Yuksel, J. N. Wilson, and P. D. Gader, Twenty years of mixture of experts, IEEE Transactions on Neural Networks and Learning Systems, vol.23, pp.1177-1193, 2012.

A. J. Zeevi, R. Meir, and V. Maiorov, Error bounds for functional approximation and estimation using mixtures of experts, IEEE Transactions on Information Theory, vol.44, pp.1010-1025, 1998.

T. Zhao, Q. Chen, Z. Kuang, J. Yu, W. Zhang et al., Deep mixture of diverse experts for large-scale visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.41, pp.1072-1087, 2018.