J. Abello, M. Panos, M. Pardalos, and . Resende, Handbook of massive data sets, vol.4, 2013.

J. Alman and R. Williams, Probabilistic polynomials and hamming nearest neighbors, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, pp.136-150, 2015.

A. Andoni, P. Indyk, L. Huy, I. Nguyen, and . Razenshteyn, Beyond localitysensitive hashing, Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pp.1018-1028, 2014.

A. Andoni and I. Razenshteyn, Optimal data-dependent hashing for approximate near neighbors, Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pp.793-801, 2015.

A. Backurs, P. Indyk, and L. Schmidt, On the fine-grained complexity of empirical risk minimization: Kernel methods and neural networks, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp.4311-4321, 2017.

M. Balcan, A. Blum, and S. Vempala, A discriminative framework for clustering via similarity functions, Proceedings of the fortieth annual ACM symposium on Theory of computing, pp.671-680, 2008.

A. Borodin, R. Ostrovsky, and Y. Rabani, Subquadratic approximation algorithms for clustering problems in high dimensional spaces, Proceedings of the thirty-first annual ACM symposium on Theory of computing, pp.435-444, 1999.

P. Breyne and M. Zabeau, Genome-wide expression analysis of plant cell cycle modulated genes, Current opinion in plant biology, vol.4, issue.2, pp.136-142, 2001.

G. Carlsson and F. Mémoli, Characterization, stability and convergence of hierarchical clustering methods, Journal of machine learning research, vol.11, pp.1425-1470, 2010.

M. Charikar and V. Chatziafratis, Approximate hierarchical clustering via sparsest cut and spreading metrics, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.841-854, 2017.

M. Charikar, V. Chatziafratis, and R. Niazadeh, Hierarchical clustering better than average-linkage, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.2291-2304, 2019.

M. Charikar, V. Chatziafratis, R. Niazadeh, and G. Yaroslavtsev, Hierarchical clustering for euclidean data, 2018.

K. Chen, On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications, SIAM Journal on Computing, vol.39, issue.3, pp.923-947, 2009.

M. Cochez and H. Mou, Twister tries: Approximate hierarchical agglomerative clustering for average distance in linear time, Proceedings of the 2015 ACM SIGMOD international conference on Management of data, pp.505-517, 2015.

V. Cohen-addad, V. Kanade, and F. Mallmann-trenn, Hierarchical clustering beyond the worst-case, Advances in Neural Information Processing Systems, pp.6201-6209, 2017.

V. Cohen-addad, V. Kanade, F. Mallmann-trenn, and C. Mathieu, Hierarchical clustering: Objective functions and algorithms, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.378-397, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02169539

S. Dasgupta, A cost function for similarity-based hierarchical clustering, 2015.

M. Datar, N. Immorlica, P. Indyk, S. Vahab, and . Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the twentieth annual symposium on Computational geometry, pp.253-262, 2004.

I. Diez, P. Bonifazi, I. Escudero, B. Mateos, M. A. Muñoz et al., A novel brain partition highlights the modular skeleton shared by structure and function, Scientific reports, vol.5, p.10532, 2015.

P. Franti, O. Virmajoki, and V. Hautamaki, Fast agglomerative clustering using a knearest neighbor graph, IEEE transactions on pattern analysis and machine intelligence, vol.28, pp.1875-1881, 2006.

J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, Springer series in statistics, vol.1, 2001.

S. Har-peled, P. Indyk, and R. Motwani, Approximate nearest neighbor: Towards removing the curse of dimensionality, Theory of computing, vol.8, issue.1, pp.321-350, 2012.

R. Impagliazzo and R. Paturi, On the complexity of k-sat, Journal of Computer and System Sciences, vol.62, issue.2, pp.367-375, 2001.

R. Impagliazzo, R. Paturi, and F. Zane, Which problems have strongly exponential complexity?, Journal of Computer and System Sciences, vol.63, issue.4, pp.512-530, 2001.

Y. Jeon, J. Yoo, J. Lee, and S. Yoon, Nc-link: A new linkage method for efficient hierarchical clustering of large-scale data, IEEE Access, vol.5, pp.5594-5608, 2017.

C. S. Karthik and P. Manurangsi, On closest pair in euclidean metric: Monochromatic is as hard as bichromatic, 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, vol.17, p.16, 2019.

M. Kull and J. Vilo, Fast approximate hierarchical clustering using similarity heuristics, BioData mining, vol.1, issue.1, p.9, 2008.

J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets, 2014.

B. Moseley and J. Wang, Approximation bounds for hierarchical clustering: Average linkage, bisecting k-means, and local search, Advances in Neural Information Processing Systems, pp.3094-3103, 2017.

M. Muja and D. G. Lowe, Scalable nearest neighbor algorithms for high dimensional data. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.36, 2014.

F. Murtagh, A survey of recent advances in hierarchical clustering algorithms, The Computer Journal, vol.26, issue.4, pp.354-359, 1983.

F. Murtagh, Comments on 'parallel algorithms for hierarchical clustering and cluster validity, IEEE Trans. Pattern Anal. Mach. Intell, vol.14, issue.10, pp.1056-1057, 1992.

D. Otair, Approximate k-nearest neighbour based spatial clustering using kd tree, 2013.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

A. Roy and S. Pokutta, Hierarchical clustering via spreading metrics, Advances in Neural Information Processing Systems, pp.2316-2324, 2016.

A. Rubinstein, Hardness of approximate nearest neighbor search, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp.1260-1268, 2018.

H. Schütze, D. Christopher, P. Manning, and . Raghavan, Introduction to information retrieval, vol.39, 2008.

. Then, argmin j avgpC, ? j q. We now argue that avgpC, ? j?q`w C`w ? j?? ?p1?

, Consider the data structure D i,? . By its correctness, D i,? returned a point p? such that ||q?pCq´p?|| 1 ? ?p||q?pCq´d i p?q|| 1 . Thus, applying Claim 1 yields that avg w pC, ??q`w C`w ?? ? ?p1`?qpavgpC,?q`w C`w? q. By the choice of j?, Let? " argmin C 1 ?C pavgpC, C 1 q`w C`wC 1 q and? " |?|