P. Agrawal, J. Carreira, and J. Malik, Learning to see by moving, Proceedings of the International Conference on Computer Vision (ICCV, p.3, 2015.

Y. M. Asano, C. Rupprecht, and A. Vedaldi, Self-labelling via simultaneous clustering and representation learning, International Conference on Learning Representations (ICLR, vol.20, p.21, 2020.

P. Bachman, R. D. Hjelm, and W. Buchwalter, Learning representations by maximizing mutual information across views, Advances in Neural Information Processing Systems (NeurIPS), p.16, 2019.

M. A. Bautista, A. Sanakoyeu, E. Tikhoncheva, and B. Ommer, Cliquecnn: Deep unsupervised exemplar learning, Advances in Neural Information Processing Systems (NeurIPS), p.3, 2016.

P. Bojanowski and A. Joulin, Unsupervised learning by predicting noise, Proceedings of the International Conference on Machine Learning (ICML, vol.2, p.18, 2017.

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov et al., End-to-end object detection with transformers, vol.6, p.18, 2020.

M. Caron, P. Bojanowski, A. Joulin, and M. Douze, Deep clustering for unsupervised learning of visual features, Proceedings of the European Conference on Computer Vision (ECCV), vol.19, p.20, 2018.

M. Caron, P. Bojanowski, J. Mairal, and A. Joulin, Unsupervised pre-training of image features on non-curated data, Proceedings of the International Conference on Computer Vision (ICCV, vol.3, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02119564

M. Caron, A. Morcos, P. Bojanowski, J. Mairal, and A. Joulin, Pruning convolutional neural networks with self-supervision, p.9, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02883772

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, vol.16, p.17, 2020.

X. Chen, H. Fan, R. Girshick, and K. He, Improved baselines with momentum contrastive learning, vol.5, p.18, 2020.

E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, Randaugment: Practical data augmentation with no separate search, p.6, 2019.

M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in Neural Information Processing Systems (NeurIPS) (2013) 3, 4, vol.5, p.19

C. Doersch, A. Gupta, and A. A. Efros, Unsupervised visual representation learning by context prediction, Proceedings of the International Conference on Computer Vision (ICCV, p.3, 2015.

J. Donahue and K. Simonyan, Large scale adversarial representation learning, Advances in Neural Information Processing Systems (NeurIPS), vol.5, p.16, 2019.

A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, and T. Brox, Discriminative unsupervised feature learning with exemplar convolutional neural networks, IEEE transactions on pattern analysis and machine intelligence, vol.38, pp.1734-1747, 2016.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The pascal visual object classes (voc) challenge, International journal of computer vision, vol.88, issue.2, p.7, 2010.

R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, Liblinear: A library for large linear classification, Journal of machine learning research, p.15, 2008.

S. Gidaris, A. Bursuc, N. Komodakis, P. Pérez, and M. Cord, Learning representations by predicting bags of visual words, vol.3, p.18, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02505058

S. Gidaris, P. Singh, and N. Komodakis, Unsupervised representation learning by predicting image rotations, International Conference on Learning Representations (ICLR), p.16, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01864755

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski et al., Accurate, large minibatch sgd: Training imagenet in 1 hour, p.14, 2017.

M. Gutmann and A. Hyvärinen, Noise-contrastive estimation: A new estimation principle for unnormalized statistical models, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, p.2, 2010.

R. Hadsell, S. Chopra, and Y. Lecun, Dimensionality reduction by learning an invariant mapping, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR, p.1, 2006.

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, Momentum contrast for unsupervised visual representation learning, vol.16, p.18, 2019.

K. He, R. Girshick, and P. Dollár, Rethinking imagenet pre-training, Proceedings of the International Conference on Computer Vision (ICCV), vol.7, p.15, 2019.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR, p.6, 2016.

O. J. Hénaff, A. Razavi, C. Doersch, S. Eslami, and A. V. Oord, Data-efficient image recognition with contrastive predictive coding, vol.5, p.16, 2019.

R. D. Hjelm, A. Fedorov, S. Lavoie-marchildon, K. Grewal, P. Bachman et al., Learning deep representations by mutual information estimation and maximization, International Conference on Learning Representations (ICLR, 2019.

J. Huang, Q. Dong, and S. Gong, Unsupervised deep learning by neighbourhood discovery, Proceedings of the International Conference on Machine Learning (ICML), p.3, 2019.

S. Jenni and P. Favaro, Self-supervised feature learning by learning to spot artifacts, Proceedings of the Conference on Computer Vision and Pattern Recognition, p.3, 2018.

L. Jing and Y. Tian, Self-supervised visual feature learning with deep neural networks: A survey, p.3, 2019.

D. Kim, D. Cho, D. Yoo, and I. S. Kweon, Learning image representations by completing damaged jigsaw puzzles, Winter Conference on Applications of Computer Vision, p.3, 2018.

A. Kolesnikov, X. Zhai, and L. Beyer, Revisiting self-supervised visual representation learning, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR, p.6, 2019.

G. Larsson, M. Maire, and G. Shakhnarovich, Learning representations for automatic colorization, Proceedings of the European Conference on Computer Vision (ECCV, p.3, 2016.

J. Li, P. Zhou, C. Xiong, R. Socher, and S. C. Hoi, Prototypical contrastive learning of unsupervised representations, vol.5, p.19, 2020.

T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft coco: Common objects in context, Proceedings of the European Conference on Computer Vision (ECCV, vol.6, p.17, 2014.

C. Liu, P. Dollár, K. He, R. Girshick, A. Yuille et al., Are labels necessary for neural architecture search?, p.9, 2020.

I. Loshchilov and F. Hutter, Sgdr: Stochastic gradient descent with warm restarts, vol.6, p.15, 2016.

D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri et al., Exploring the limits of weakly supervised pretraining, Proceedings of the European Conference on Computer Vision (ECCV), p.8, 2018.

A. Mahendran, J. Thewlis, and A. Vedaldi, Cross pixel optical flow similarity for self-supervised learning, p.3, 2018.

P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen et al., Mixed precision training, p.14, 2017.

I. Misra and L. Van-der-maaten, Self-supervised learning of pretext-invariant representations, vol.17, p.18, 2019.

I. Misra, C. L. Zitnick, and M. Hebert, Shuffle and learn: unsupervised learning using temporal order verification, Proceedings of the European Conference on Computer Vision (ECCV, p.3, 2016.

M. Noroozi and P. Favaro, Unsupervised learning of visual representations by solving jigsaw puzzles, Proceedings of the European Conference on Computer Vision (ECCV, vol.3, p.5, 2016.

A. Oord, Y. Li, and O. Vinyals, Representation learning with contrastive predictive coding, 2018.

D. Pathak, R. Girshick, P. Dollár, T. Darrell, and B. Hariharan, Learning features by watching objects move, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR, p.3, 2017.

D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, Context encoders: Feature learning by inpainting, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), p.3, 2016.

S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems (NeurIPS) (2015) 6, 7, 15, vol.17, p.18

K. Sohn, D. Berthelot, C. L. Li, Z. Zhang, N. Carlini et al., Fixmatch: Simplifying semi-supervised learning with consistency and confidence, p.6, 2020.

Y. Tian, D. Krishnan, and P. Isola, Contrastive multiview coding, p.16, 2019.

H. Touvron, A. Vedaldi, M. Douze, and H. Jégou, Fixing the train-test resolution discrepancy, Advances in Neural Information Processing Systems (NeurIPS), 2019.

G. Van-horn, O. Mac-aodha, Y. Song, Y. Cui, C. Sun et al., The inaturalist species classification and detection dataset, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), p.7, 2018.

X. Wang and A. Gupta, Unsupervised learning of visual representations using videos, Proceedings of the International Conference on Computer Vision (ICCV, p.3, 2015.

X. Wang, K. He, and A. Gupta, Transitive invariance for self-supervised visual representation learning, Proceedings of the International Conference on Computer Vision (ICCV, p.3, 2017.

Y. Wu, A. Kirillov, F. Massa, W. Y. Lo, and R. Girshick, , p.15, 2019.

Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, Unsupervised feature learning via non-parametric instance discrimination, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), vol.19, p.21, 2018.

J. Xie, R. Girshick, and A. Farhadi, Unsupervised deep embedding for clustering analysis, Proceedings of the International Conference on Machine Learning (ICML), p.3, 2016.

Q. Xie, Z. D. Dai, E. Hovy, M. T. Luong, and Q. V. Le, Unsupervised data augmentation for consistency training, p.6, 2020.

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR, p.8, 2017.