A. Achille and S. Soatto, Information Dropout: learning optimal representations through noisy computation, vol.32, p.23, 2016.

G. Alain and Y. Bengio, What regularized auto-encoders learn from the data-generating distribution, Journal of Machine Learning Research, p.25, 2014.

A. Alemi, Improving Inception and Image Classification in TensorFlow, p.17, 2016.

A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, Deep Variational Information Bottleneck, International Conference on Learning Representations (ICLR, vol.32, p.23, 2017.

G. An, The effects of adding noise during backpropagation training on a generalization performance, Neural computation (cit, p.20, 1996.

D. Bang and H. Shim, Mggan: Solving mode collapse using manifold guided training, p.82, 2018.

. Ben-david, J. Shai, K. Blitzer, A. Crammer, F. Kulesza et al., A theory of learning from different domains, Machine learning, p.106, 2010.

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (cit, p.15, 2013.

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, Advances in Neural Information Processing Systems (NIPS) (cit, vol.45, p.23, 2007.

H. Ben-younes, R. Cadène, N. Thome, and M. Cord, MU-TAN: Multimodal Tucker Fusion for Visual Question Answering, IEEE International Conference on Computer Vision (ICCV), 2017.

Y. Blau and T. Michaeli, The perception-distortion tradeoff, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.81, 2018.

M. Blot, Study on training methods and generalization performance of deep learning for image classification, vol.13, p.11, 2018.

M. Blot, T. Robert, N. Thome, and M. Cord, SHADE: Information Based Regularization for Deep Learning -Extended version, p.31, 2018.

M. Blot, T. Robert, N. Thome, and M. Cord, SHADE: Information-Based Regularization for Deep Learning, IEEE International Conference on Image Processing (ICIP) (cit. on, vol.11, p.9, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01994740

A. Blum and T. Mitchell, Combining labeled and unlabeled data with co-training, Annual Conference on Computational Learning Theory (cit, p.24, 1998.

N. Bodla, G. Hua, and R. Chellappa, Semi-supervised FusedGAN for Conditional Image Generation, European Conference on Computer Vision (ECCV) (cit. on pp. 26, vol.82, p.49, 2018.

L. Bottou, Large-scale machine learning with stochastic gradient descent, p.15, 2010.

K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, Domain separation networks, Advances in Neural Information Processing Systems (NIPS) (cit, p.107, 2016.

A. Brock, J. Donahue, and K. Simonyan, Large scale gan training for high fidelity natural image synthesis, International Conference on Learning Representations (ICLR) (cit, p.82, 2019.

J. Bruna and S. Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (cit, p.21, 2013.

M. Carvalho, R. Cadène, D. Picard, L. Soulier, N. Thome et al., Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings, Special Interest Group on Information Retrieval (SIGIR) (cit, p.44, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01931470

W. Chang, H. Wang, W. Peng, and W. Chiu, All about Structure: Adapting Structural Information across Domains for Boosting Semantic Segmentation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit. on pp, vol.107, p.77, 2019.

. Chen, G. Liang-chieh, I. Papandreou, K. Kokkinos, A. L. Murphy et al., Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017.

T. Chen, X. Qi, . Li, B. Roger, D. K. Grosse et al., Isolating sources of disentanglement in variational autoencoders, Advances in Neural Information Processing Systems (NeurIPS), 2018.

X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever et al., InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, Advances in Neural Information Processing Systems (NIPS) (cit, p.129, 2016.

B. Cheung, J. A. Livezey, K. Arjun, B. A. Bansal, and . Olshausen, Discovering hidden factors of variation in deep networks, International Conference on Machine Learning Workshop (ICML-W) (cit. on p, p.27, 2015.

Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim et al., Stargan: Unified generative adversarial networks for multi-domain image-to-image translation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.81, 2018.

D. Cire?an, U. Meier, and J. Schmidhuber, Multi-column deep neural networks for image classification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.20, 2012.

A. Coates, A. Ng, and H. Lee, An analysis of single-layer networks in unsupervised feature learning, International Conference on Artificial Intelligence and Statistics (AISTATS) (cit, p.60, 2011.

T. Cohen and M. Welling, Group equivariant convolutional networks, International Conference on Machine Learning (ICML), 2016.

T. Cover and J. Thomas, Elements of information theory, 1991.

J. Dai, Y. Li, K. He, and J. Sun, R-FCN: Object detection via region-based fully convolutional networks, Advances in Neural Information Processing Systems (NIPS) (cit, p.44, 2016.

E. Denton, S. Gross, and R. Fergus, Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks, International Conference on Learning Representations (ICLR, 2017.

T. Devries and G. W. Taylor, Dataset augmentation in feature space, International Conference on Learning Representations Workshop (ICLR-W) (cit, p.20, 2017.

S. Dieleman, K. W. Willett, and J. Dambre, Rotation-invariant convolutional neural networks for galaxy morphology prediction, Monthly notices of the royal astronomical society, 2015.

J. Donahue, P. Krähenbühl, and T. Darrell, Adversarial Feature Learning, International Conference on Learning Representations (ICLR) (cit, p.82, 2017.

A. Dosovitskiy and T. Brox, Generating images with perceptual similarity metrics based on deep networks, Advances in Neural Information Processing Systems (NIPS) (cit, vol.87, 2016.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, p.15, 2011.

V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Arjovsky et al., Adversarially Learned Inference, International Conference on Learning Representations (ICLR) (cit, p.82, 2017.

V. Dumoulin and F. Visin, A guide to convolution arithmetic for deep learning, 2016.

E. Dupont, Learning disentangled joint continuous and discrete representations, Advances in Neural Information Processing Systems (NeurIPS) (cit, p.83, 2018.

T. Durand, Deep Architectures in LaTeX, p.16, 2017.

T. Durand, N. Mehrasa, and G. Mori, Learning a Deep ConvNet for Multi-label Classification with Partial Labels, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.24, 2019.

T. Durand, N. Thome, and M. Cord, WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit. on pp, vol.38, p.37, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01343785

D. A. Dyk, X. Van, and . Meng, The Art of Data Augmentation, Journal of Computational and Graphical Statistics, 2001.

J. Engel, M. Hoffman, and A. Roberts, Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, International Conference on Learning Representations (ICLR, vol.83, p.82, 2018.

M. Engilberge, L. Chevallier, P. Pérez, and M. Cord, Finding beans in burgers: Deep semantic-visual embedding with localization, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.44, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02171857

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent et al., Why does unsupervised pre-training help deep learning?, In: Journal of Machine Learning Research, p.23, 2010.

K. Fukushima, Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, p.4, 1980.

Y. Ganin and V. Lempitsky, Unsupervised Domain Adaptation by Backpropagation, International Conference on Machine Learning (ICML), p.38, 2015.

Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle et al., DomainAdversarial Training of Neural Networks, Journal of Machine Learning Research, p.60, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01624607

X. Gastaldi, Shake-Shake regularization of 3-branch residual networks, International Conference on Learning Representations Workshop (ICLR-W), 2017.

A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (cit, p.93, 2001.

G. Ghiasi, T. Lin, and Q. Le, DropBlock: A regularization method for convolutional networks, Advances in Neural Information Processing Systems (NeurIPS) (cit, p.21, 2018.

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, International Conference on Artificial Intelligence and Statistics (AISTATS) (cit, p.25, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

H. Goh, N. Thome, M. Cord, and J. Lim, Topdown regularization of deep belief networks, Advances in Neural Information Processing Systems (NIPS) (cit, p.25, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00947569

A. N. Gomez, M. Ren, R. Urtasun, and R. Grosse, The Reversible Residual Network: Backpropagation Without Storing Activations, Advances in Neural Information Processing Systems (NIPS) (cit. on pp. 50, vol.108, p.51, 2017.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, vol.21, p.19, 2016.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative Adversarial Nets, Advances in Neural Information Processing Systems (NIPS) (cit. on pp. 9, vol.25, p.81, 2014.

I. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, International Conference on Learning Representations (ICLR) (cit, vol.26, p.20, 2015.

Y. Grandvalet and Y. Bengio, Semi-supervised learning by entropy minimization, Advances in neural information processing systems, p.24, 2005.

Y. Grandvalet, S. Canu, and S. Boucheron, Noise injection: Theoretical prospects, Neural Computation (cit, p.20, 1997.

N. Hadad, L. Wolf, and M. Shahar, A Two-Step Disentanglement Method, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit. on pp. 7, 84, vol.86, p.91, 2018.

J. Hale and . Loke, More Than 500 Hours Of Content Are Now Being Uploaded To YouTube Every Minute, 2019.

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit. on pp. 4, 12, vol.17, p.37, 2016.

Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, Attgan: Facial attribute editing by only changing what you want, IEEE Transactions on Image Processing (TIP) (cit, p.82, 2019.

I. Higgins, D. Amos, D. Pfau, S. Racaniere, L. Matthey et al., Towards a Definition of Disentangled Representations, 2018.

I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot et al., beta-vae: Learning basic visual concepts with a constrained variational framework, International Conference on Learning Representations (ICLR, 2017.

G. Hinton and R. Salakhutdinov, Reducing the dimensionality of data with neural networks". In: Science (cit, vol.46, p.23, 2006.

G. Hinton, N. Srivastava, and K. Swersky, Neural networks for machine learning lecture 6a overview of mini-batch gradient descent (cit, p.15, 2012.

Q. Hu, A. Szabó, T. Portenier, P. Favaro, and M. Zwicker, Disentangling Factors of Variation by Mixing Them, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.84, 2018.

D. H. Hubel, N. Torsten, and . Wiesel, Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, The Journal of physiology, p.16, 1962.

S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Journal of Machine Learning Research, 2016.

A. Iscen and G. Tolias, Label Propagation for Deep Semi-supervised Learning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

J. Jacobsen, A. Smeulders, and E. Oyallon, International Conference on Learning Representations (ICLR) (cit. on pp. 50, vol.108, p.51, 2018.

A. Jaiswal and R. Y. Wu, Unsupervised Adversarial Invariance, Advances in Neural Information Processing Systems (NeurIPS) (cit. on pp, vol.27, p.91, 2018.

G. Kang, J. Li, and D. Tao, Shakeout: A new regularized deep neural network training scheme, Conference on Artificial Intelligence (AAAI) (cit, p.20, 2016.

T. Karras, T. Aila, S. Laine, and J. Lehtinen, Progressive growing of gans for improved quality, stability, and variation, International Conference on Learning Representations (ICLR) (cit, p.81, 2017.

T. Karras, S. Laine, and T. Aila, A style-based generator architecture for generative adversarial networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit. on pp. 77, vol.81, 2019.

K. Kawaguchi, Deep learning without poor local minima, Advances in Neural Information Processing Systems (NIPS) (cit. on p. 5), 2016.

H. Kim and A. Mnih, Disentangling by factorising". In: arXiv preprint libary, p.83, 2018.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations (ICLR) (cit, p.15, 2015.

D. P. Kingma and P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, Advances in Neural Information Processing Systems (NeurIPS) (cit, p.109, 2018.

D. P. Kingma, S. Mohamed, D. Jimenez-rezende, and M. Welling, Semi-supervised learning with deep generative models, Advances in Neural Information Processing Systems (NIPS), 2014.

D. P. Kingma and M. Welling, Auto-Encoding Variational Bayes, International Conference on Learning Representations (ICLR) (cit. on pp. 25, vol.80, 2013.

J. Klys, J. Snell, and R. Zemel, Learning Latent Subspaces in Variational Autoencoders, Advances in Neural Information Processing Systems (NeurIPS) (cit. on pp. 7, 84, vol.86, p.91, 2018.

N. Kodali, J. Abernethy, J. Hays, and Z. Kira, On convergence and stability of gans, p.81, 2017.

I. Kokkinos, Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, vol.60, p.36, 2009.

A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS) (cit. on pp, vol.4, pp.15-17, 2012.

A. Krogh and J. A. Hertz, A Simple Weight Decay Can Improve Generalization, Advances in Neural Information Processing Systems (NIPS), 1992.

J. Kuka?ka, V. Golkov, and D. Cremers, Regularization for deep learning: A taxonomy, 2017.

T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum, Deep Convolutional Inverse Graphics Network, Advances in Neural Information Processing Systems (NIPS), p.84, 2015.

S. Laine and T. Aila, Temporal Ensembling for Semi-Supervised Learning, International Conference on Learning Representations (ICLR, 2017.

G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer et al., Fader Networks:Manipulating Images by Sliding Attributes, Advances in Neural Information Processing Systems (NIPS) (cit. on pp. 7, 27, vol.84, p.91, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02275215

H. Larochelle and Y. Bengio, Classification using discriminative restricted Boltzmann machines, International Conference on Machine Learning (ICML), 2008.

L. Le, A. Patterson, and M. White, Supervised autoencoders: Improving generalization performance with unsupervised regularizers, Advances in Neural Information Processing Systems (NeurIPS) (cit, vol.87, p.46, 2018.

Y. Lecun, B. Boser, S. John, D. Denker, R. E. Henderson et al., Backpropagation applied to handwritten zip code recognition, Neural computation (cit, p.4, 1989.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient based learning applied to document recognition, Proceedings of the IEEE, vol.60, p.16, 1998.

Y. Lecun, J. Fu, L. Huang, and . Bottou, Learning methods for generic object recognition with invariance to pose and lighting, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.93, 2004.

C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham et al., Photo-realistic single image super-resolution using a generative adversarial network, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.81, 2017.

D. Lee, Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, International Conference on Machine Learning Workshop (ICML-W) (cit, p.24, 2013.

L. Ba, J. R. Jimmy, G. Kiros, and . Hinton, Layer normalization". In: arXiv preprint libary, p.22, 2016.

Y. Li and F. Liu, Whiteout: Gaussian Adaptive Noise Regularization in FeedForward Neural Networks". In: arXiv preprint libary, p.20, 2016.

Y. Liu, Z. Wang, H. Jin, and I. Wassell, Multi-Task Adversarial Network for Disentangled Feature Learning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit. on pp, vol.27, p.91, 2018.

Y. Liu, F. Wei, J. Shao, L. Sheng, J. Yan et al., Exploring Disentangled Feature Representation Beyond Face Identification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit. on pp. 7, 27, vol.84, 2018.

Z. Liu, P. Luo, X. Wang, and X. Tang, Deep Learning Face Attributes in the Wild, IEEE International Conference on Computer Vision (ICCV) (cit, p.92, 2015.

C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel, The variational fair autoencoder, International Conference on Learning Representations (ICLR) (cit, p.77, 2016.

Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang, The expressive power of neural networks: A view from the width, Advances in Neural Information Processing Systems (NIPS) (cit. on p, vol.18, 2017.

T. Lucas, K. Shmelkov, K. Alahari, C. Schmid, and J. Verbeek, Adversarial training of partially invertible variational autoencoders, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01886285

W. Luo, Y. Li, R. Urtasun, and R. Zemel, Understanding the effective receptive field in deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS) (cit, p.16, 2016.

. Luvizon, C. Diogo, H. Tabia, and D. Picard, Human pose regression by combining indirect part detection and contextual information, p.54, 2017.

A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow, Adversarial Autoencoders, International Conference on Learning Representations (ICLR) (cit, p.82, 2016.

S. Mallat, Group Invariant Scattering, Communications on Pure and Applied Mathematics (CPAM) (cit. on p. 21), 2012.

S. Mallat and G. Peyré, A wavelet tour of signal processing : the sparse way, p.52, 2009.

M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann et al., Disentangling factors of variation in deep representation using adversarial training, Advances in Neural Information Processing Systems (NIPS) (cit. on pp. 7, 27, vol.77, 2016.

W. S. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics, p.15, 1943.

É. Mehr, A. Lieutier, F. S. Bermudez, V. Guitteny, N. Thome et al., Manifold Learning in Quotient Spaces, 2018.

T. Miyato, M. Shin-ichi-maeda, K. Koyama, S. Nakae, and . Ishii, Distributional smoothing with virtual adversarial training, International Conference on Learning Representations (ICLR) (cit, p.26, 2016.

A. Neelakantan, L. Vilnis, V. Quoc, I. Le, L. Sutskever et al., Adding gradient noise improves learning for very deep networks, International Conference on Learning Representations (ICLR, p.20, 2016.

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu et al., Reading digits in natural images with unsupervised feature learning, Advances in Neural Information Processing Systems Workshop (NIPS-W), p.60, 2011.

H. Noh, T. You, J. Mun, and B. Han, Regularizing deep neural networks by noise: Its interpretation and optimization, Advances in Neural Information Processing Systems (NIPS) (cit, p.20, 2017.

D. Noyes, The Top 20 Valuable Facebook Statistics, 2019.

C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, Activation functions: Comparison of trends in practice and research for deep learning, p.15, 2018.

C. Olah, A. Mordvintsev, and L. Schubert, Feature Visualization, 2017.

A. Oliver, A. Odena, C. A. Raffel, E. Dogus-cubuk, and I. Goodfellow, Realistic evaluation of deep semi-supervised learning algorithms, Advances in Neural Information Processing Systems (NeurIPS) (cit, p.97, 2018.

L. Paninski, Estimation of Entropy and Mutual Information, Neural Computation, p.129, 2003.

M. Paumard, D. Picard, and H. Tabia, Jigsaw Puzzle Solving Using Local Feature Co-Occurrences in Deep Neural Networks, IEEE International Conference on Image Processing (ICIP) (cit, p.53, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01820489

X. Peng, X. Yu, K. Sohn, D. N. Metaxas, and M. Chandraker, Reconstruction-based disentanglement for pose-invariant face recognition, IEEE International Conference on Computer Vision (ICCV) (cit. on pp, vol.84, p.27, 2017.

G. Perarnau, J. Van-de, B. Weijer, J. M. Raducanu, and . Álvarez, Invertible conditional gans for image editing, Advances in Neural Information Processing Systems Workshop (NIPS-W) (cit. on pp. 7, 27, vol.77, p.91, 2016.

G. Pereyra, G. Tucker, J. Chorowski, and L. Kaiser, Regularizing Neural Networks by Penalizing Confident Output Distributions, International Conference on Learning Representations Workshop (ICLR-W), p.23, 2017.

D. C. Plaut, Experiments on Learning by Back Propagation, p.20, 1986.

M. &. Ranzato, Y. Aurelio, Y. L. Boureau, and . Cun, Sparse Feature Learning for Deep Belief Networks, Advances in Neural Information Processing Systems (NIPS) (cit, p.25, 2008.

M. &. Ranzato, F. J. Aurelio, Y. L. Huang, Y. Boureau, and . Lecun, Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.49, 2007.

M. &. Ranzato, C. Aurelio, S. Poultney, Y. Chopra, and . Lecun, Efficient Learning of Sparse Representations with an Energy-Based Model, Advances in Neural Information Processing Systems (NIPS) (cit, p.25, 2007.

M. &. Ranzato, A. , and M. Szummer, Semi-supervised learning of compact document representations with deep networks, International Conference on Machine Learning (ICML), 2008.

A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko, Semi-supervised learning with ladder networks, Advances in Neural Information Processing Systems (NIPS) (cit. on pp, vol.46, p.71, 2015.

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems (NIPS), 2015.

S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, Contractive auto-encoders: Explicit invariance during feature extraction, International Conference on Machine Learning (ICML), p.22, 2011.

T. Robert, N. Thome, and M. Cord, HybridNet: Classification and Reconstruction Cooperation for Semi-Supervised Learning, European Conference on Computer Vision (ECCV) (cit. on pp. 9, vol.157, p.43, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02073640

T. Robert, N. Thome, and M. Cord, DualDis: DualBranch Disentangling with Adversarial Learning, Under Review at Advances in Neural Information Processing Systems (NeurIPS) (cit. on, vol.75, p.9, 2019.

M. Rosca, B. Lakshminarayanan, D. Warde-farley, and S. Mohamed, Variational approaches for auto-encoding generative adversarial networks, p.82, 2017.

K. Roth, A. Lucchi, S. Nowozin, and T. Hofmann, Stabilizing training of generative adversarial networks through regularization, Advances in Neural Information Processing Systems (NIPS) (cit, p.81, 2017.

A. Ruiz, O. Martinez, X. Binefa, and J. Verbeek, Learning Disentangled Representations with Reference-Based Variational Autoencoders, p.77, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01896007

D. E. Rumelhart, G. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Cognitive modeling, 1988.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.37, 2015.

M. Sajjadi, M. Javanmardi, and T. Tasdizen, Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning, Advances in Neural Information Processing Systems (NIPS), 2016.

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford et al., Improved Techniques for Training GANs, Advances in Neural Information Processing Systems (NIPS), 2016.

S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, How does batch normalization help optimization, Advances in Neural Information Processing Systems (NeurIPS), 2018.

I. Seck, G. Loosli, and S. Canu, L1-norm double backpropagation adversarial defense, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), p.22, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02049020

O. Shamir, S. Sabato, and N. Tishby, Learning and generalization with the information bottleneck, Theoretical Computer Science (cit, p.23, 2010.

W. Shi, Y. Gong, C. Ding, Z. Maxiaoyu-tao, and N. Zheng, Transductive semi-supervised deep learning using min-max features, Proceedings of the European Conference on Computer Vision (ECCV), p.24, 2018.

Z. Shu and M. Sahasrabudhe, Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance, Riza Alp Guler, Dimitris Samaras, Nikos Paragios, and Iasonas Kokkinos, p.54, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01935596

Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, and E. Shechtman, Neural face editing with intrinsic image disentangling, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.54, 2017.

P. Y. Simard, D. Steinkraus, and J. C. Platt, Best practices for convolutional neural networks applied to visual document analysis, Icdar (cit, p.20, 2003.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR) (cit. on pp, vol.23, pp.15-18, 2015.

J. Sokoli?, R. Giryes, G. Sapiro, and . Rodrigues, Robust large margin deep neural networks, IEEE Transactions on Signal Processing, p.22, 2017.

J. Springenberg and . Tobias, Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks, International Conference on Learning Representations (ICLR, 2016.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research, 2014.

W. Sweldens, The Lifting Scheme: A New Philosophy in Biorthogonal Wavelet Constructions". In: Wavelet Applications in Signal and Image Processing III, p.50, 1995.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, vol.18, p.17, 2015.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, vol.20, p.17, 2016.

A. Tarvainen and H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, Advances in Neural Information Processing Systems (NIPS), vol.66, pp.140-142, 2017.

N. Tishby, F. C. Pereira, and W. Bialek, The information bottleneck method, Annual Allerton Conference on Communication, Control and Computing, 1999.

N. Tishby and N. Zaslavsky, Deep learning and the information bottleneck principle, Information Theory Workshop (ITW). IEEE, vol.128, p.31, 2015.

L. Tran, X. Yin, and X. Liu, Disentangled Representation Learning GAN for Pose-Invariant Face Recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.84, 2017.

J. Turian, L. Ratinov, and Y. Bengio, Word representations: a simple and general method for semi-supervised learning, Proceedings of the 48th annual meeting of the association for computational linguistics, 2010.

D. Ulyanov, A. Vedaldi, and V. Lempitsky, Instance normalization: The missing ingredient for fast stylization, p.22, 2016.

V. Vapnik, Principles of risk minimization for learning theory, Advances in Neural Information Processing Systems (NIPS) (cit. on p, vol.18, 1992.

V. Vapnik and . Ya-chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Measures of Complexity, p.18, 1972.

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, International Conference on Machine Learning (ICML), p.25, 2008.

L. Wan, M. Zeiler, S. Zhang, Y. Le-cun, and R. Fergus, Regularization of Neural Networks using DropConnect, International Conference on Machine Learning (ICML), 2013.

. Wang, M. Ting-chun, J. Liu, A. Zhu, J. Tao et al., High-resolution image synthesis and semantic manipulation with conditional gans, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (cit, p.82, 2018.

J. Weston, F. Ratle, and R. Collobert, Deep Learning via Semi-supervised Embedding, International Conference on Machine Learning (ICML), 2008.

Z. Wojna, V. Ferrari, S. Guadarrama, N. Silberman, L. Chen et al., The Devil is in the Decoder, British Machine Vision Conference (BMVC) (cit, p.50, 2017.

Y. Wu and K. He, Group normalization, European Conference on Computer Vision (ECCV) (cit, p.22, 2018.

S. Zagoruyko and N. Komodakis, Wide Residual Networks". In: arXiv preprint libary, p.36, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832503

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, European Conference on Computer Vision (ECCV) (cit, p.55, 2014.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, International Conference on Learning Representations (ICLR, 2017.

H. Zhang, T. Yann-n-dauphin, and . Ma, Fixup Initialization: Residual Learning Without Normalization, International Conference on Learning Representations (ICLR) (cit. on p. 5), 2019.

Y. Zhang, K. Lee, and H. Lee, Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification, International Conference on Machine Learning (ICML) (cit. on pp. 23, vol.46, p.55, 2016.

J. Zhao, M. Mathieu, R. Goroshin, and Y. Lecun, Stacked What-Where Auto-encoders, International Conference on Learning Representations Workshop (ICLR-W) (cit. on pp, vol.53, p.66, 2016.

J. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, IEEE International Conference on Computer Vision (ICCV) (cit, p.81, 2017.

L. Zhu, R. Deng, M. Maire, Z. Deng, G. Mori et al., Sparsely aggregated convolutional networks, European Conference on Computer Vision (ECCV) (cit, p.22, 2018.

X. Zhu, Semi-Supervised Learning Literature Survey, p.24, 2005.

X. Zhu and Z. Ghahramani, Learning from labeled and unlabeled data with label propagation, p.24, 2002.

, unlabeled images) with batches of 11 unlabeled images and 5 labeled images. Hyperparameters values and scheduling over training are detailed in Table B, vol.8

, ? In the encoder, every layer is followed by batch normalization and ReLU

, ? In the decoder, every layer is followed by a batch normalization and LeakyReLU(0.2), except last layer which has no activation or BN

, ? In the classifiers, every intermediate layer is followed by a ReLU

, ? Layers are described using the following syntax: ? Conv: 128