A. Http, ac-discovery.com Q2/08 19080 ASDI http ://www.asdi, p.9069

. Biotech-corp-of-america-http, biotech-us.com Q2/08 >120000 Cerep http ://www.cerep.fr Q2/08 >16500 ChemBridge http, p.3870

. Princeton-biomolecular-http, com Q2/08 >500000 Pyxis discovery http ://www.pyxis-discovery, p.964619

A. R. Leach and M. M. Hann, The in silico world of virtual libraries, Drug Discovery Today, vol.5, issue.8, pp.326-336, 2000.
DOI : 10.1016/S1359-6446(00)01516-6

J. A. Lumley, Compound Selection and Filtering in Library Design, QSAR & Combinatorial Science, vol.1, issue.8, pp.1066-1075, 2005.
DOI : 10.1002/qsar.200520136

A. D. Gorse and . Curr, Diversity in Medicinal Chemistry Space, Current Topics in Medicinal Chemistry, vol.6, issue.1, pp.3-18, 2006.
DOI : 10.2174/156802606775193310

C. A. Lipinski and A. Hopkins, Navigating chemical space for biology and medicine, Nature, vol.7, issue.7019, pp.855-861, 2004.
DOI : 10.1016/j.jmb.2003.12.068

C. M. Dobson, Chemical space and biology, Nature, vol.235, issue.7019, pp.824-828, 2004.
DOI : 10.1126/science.1084772

P. Ertl, Cheminformatics Analysis of Organic Substituents:??? Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups, Journal of Chemical Information and Computer Sciences, vol.43, issue.2, pp.374-380, 2003.
DOI : 10.1021/ci0255782

J. J. Irwin and B. K. Shoichet, ZINC ??? A Free Database of Commercially Available Compounds for Virtual Screening, Journal of Chemical Information and Modeling, vol.45, issue.1, pp.177-182, 2005.
DOI : 10.1021/ci049714+

E. Proschak, J. K. Wegner, A. Schuller, G. Schneider, and U. Fechner, Molecular Query Language (MQL)A Context-Free Grammar for Substructure Matching, Journal of Chemical Information and Modeling, vol.47, issue.2, pp.295-301, 2007.
DOI : 10.1021/ci600305h

P. Ertl, S. Jelfs, and . Curr, Designing Drugs on the Internet? Free Web Tools and Services Supporting Medicinal Chemistry, Current Topics in Medicinal Chemistry, vol.7, issue.15, pp.1491-1501, 2007.
DOI : 10.2174/156802607782194707

P. Lind and M. Alm, A Database-Centric Virtual Chemistry System, Journal of Chemical Information and Modeling, vol.46, issue.3, pp.1034-1039, 2006.
DOI : 10.1021/ci050360b

R. Todeschini and V. Consonni, Handbook of Molecular Descriptors, E.A. QSAR Comb. Sci, vol.25, pp.1133-1142, 2006.
DOI : 10.1002/9783527613106

Q. Liao, J. Yao, and S. Yuan, SVM approach for predicting LogP, Molecular Diversity, vol.22, issue.3, pp.301-309, 2006.
DOI : 10.1007/s11030-006-9036-2

J. Gola and O. O. , ADMET Property Prediction: The State of the Art and Current Challenges, QSAR & Combinatorial Science, vol.13, issue.12, pp.1172-1180, 2006.
DOI : 10.1002/qsar.200610093

D. Bonchev, Information Theoritic Indices for Characterization of Chemical Structures, 1983.

P. Baldi, R. W. Benz, D. S. Hirschberg, and S. J. Swamidass, Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval, Journal of Chemical Information and Modeling, vol.47, issue.6, pp.2098-2109, 2007.
DOI : 10.1021/ci700200n

U. Fechner and J. P. , Comparison of Three Holographic Fingerprint Descriptors and their Binary Counterparts, QSAR & Combinatorial Science, vol.41, issue.8, pp.961-967, 2005.
DOI : 10.1002/qsar.200530118

J. W. Godden, F. L. Stahura, and J. Bajorath, Anatomy of Fingerprint Search Calculations on Structurally Diverse Sets of Active Compounds, Journal of Chemical Information and Modeling, vol.45, issue.6, pp.1812-1819, 2005.
DOI : 10.1021/ci050276w

A. Varnek, D. Fourches, F. Hoonakker, and V. J. Solov-'ev, Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures, Journal of Computer-Aided Molecular Design, vol.10, issue.9-10, pp.693-703, 2005.
DOI : 10.1007/s10822-005-9008-0

Y. Zyrianov, Distribution-Based Descriptors of the Molecular Shape, Journal of Chemical Information and Modeling, vol.45, issue.3, pp.657-672, 2005.
DOI : 10.1021/ci050005l

. Absolv, Sirius Analytical Instrumetns Ltd : East Sussex UK. [110] QikProp ; Schrödinger, Inc. [111] QMPRPLUS ; Simulations plus, pp.93534-2902

J. Gola and O. Obrezanova, ADMET Property Prediction: The State of the Art and Current Challenges, QSAR & Combinatorial Science, vol.13, issue.12, pp.1172-1180, 2006.
DOI : 10.1002/qsar.200610093

M. P. Gleeson, Generation of a Set of Simple, Interpretable ADMET Rules of Thumb, Journal of Medicinal Chemistry, vol.51, issue.4, pp.817-834, 2008.
DOI : 10.1021/jm701122q

Y. C. Martin, A Bioavailability Score, Journal of Medicinal Chemistry, vol.48, issue.9, pp.3164-3170, 2005.
DOI : 10.1021/jm0492002

M. C. Hutter, Separating Drugs from Nondrugs:?? A Statistical Approach Using Atom Pair Distributions, Journal of Chemical Information and Modeling, vol.47, issue.1, pp.186-194, 2007.
DOI : 10.1021/ci600329u

G. M. Rishton, Nonleadlikeness and leadlikeness in biochemical screening, Drug Discovery Today, vol.8, issue.2, pp.86-96, 2003.
DOI : 10.1016/S1359644602025722

G. M. Rishton, Reactive compounds and in vitro false positives in HTS, Drug Discovery Today, vol.2, issue.9, pp.382-384, 1997.
DOI : 10.1016/S1359-6446(97)01083-0

S. L. Mcgovern, E. Caselli, N. Grigorieff, and B. K. Shoichet, A Common Mechanism Underlying Promiscuous Inhibitors from Virtual and High-Throughput Screening, Journal of Medicinal Chemistry, vol.45, issue.8, pp.1712-1722, 2002.
DOI : 10.1021/jm010533y

J. Seidler, S. L. Mcgovern, T. N. Doman, and B. K. Shoichet, Identification and Prediction of Promiscuous Aggregating Inhibitors among Known Drugs, Journal of Medicinal Chemistry, vol.46, issue.21, pp.4477-4486, 2003.
DOI : 10.1021/jm030191r

J. K. Wegner, H. J. Vanvlijmen, I. T. Chem-nabney, B. S. Williams, A. Sewing et al., Self-Organizing Maps, [133] Bishop, C.M. Neural Networks for Pattern Recognition AdV. Neural Inf. Proc. Syst, pp.1279-1293, 1969.

. Inf, R. W. Spencer, H. Matter, Y. C. Martin, J. L. Kofron et al., Concepts and Applications of Molecular Similarity, J. Biomol. Screen. J. Med. Chem. J. Med. Chem. Doucet, J. ; Petitjean, M Mol. Divers, vol.46142145146, issue.10, pp.1094-1097, 1990.

M. L. Brewer, Development of a Spectral Clustering Method for the Analysis of Molecular Data Sets, Journal of Chemical Information and Modeling, vol.47, issue.5, pp.1727-1733, 2007.
DOI : 10.1021/ci600565r

Y. Wang and J. Bajorath, Balancing the Influence of Molecular Complexity on Fingerprint Similarity Searching, Journal of Chemical Information and Modeling, vol.48, issue.1, pp.75-84, 2008.
DOI : 10.1021/ci700314x

A. Schuffenhauer, N. Brown, P. Selzer, P. Ertl, and E. Jacoby, Relationships between Molecular Complexity, Biological Activity, and Structural Diversity, Journal of Chemical Information and Modeling, vol.46, issue.2, pp.525-535, 2006.
DOI : 10.1021/ci0503558

D. M. Bayada, H. Hamersma, and V. J. Van-geerestein, Molecular Diversity and Representativity in Chemical Databases, Journal of Chemical Information and Computer Sciences, vol.39, issue.1, pp.1-10, 1999.
DOI : 10.1021/ci980109e

M. Landon and S. Schaus, JEDA: Joint entropy diversity analysis. An information-theoretic method for choosing diverse and representative subsets from combinatorial libraries, Molecular Diversity, vol.18, issue.3, pp.333-339, 2006.
DOI : 10.1007/s11030-006-9042-4

X. Q. Xie and J. Chen, Data Mining a Small Molecule Drug Screening Representative Subset from NIH PubChem, Journal of Chemical Information and Modeling, vol.48, issue.3, pp.465-475, 2008.
DOI : 10.1021/ci700193u

W. Li, A Fast Clustering Algorithm for Analyzing Highly Similar Compounds of Very Large Libraries, Journal of Chemical Information and Modeling, vol.46, issue.5, pp.1919-1923, 2006.
DOI : 10.1021/ci0600859

E. D. Outliers, Nous nous fixons donc sur l'´ etude de la moyenne des runs. 4.4

I. De, L. Taille-de-l-'´-echantillonnage-d-'un, . Jeu, and . Outliers, 500 et 100, Distribution des distances entre centres deuxàdeuxà deux dans l'´ echantillon (SDTot), pp.100-103

. Comme-pour-la-méthode-k-center, la méthode Maximum-Dissimilarity produit deséchantillons deséchantillons issus de jeu avec ou sans outlier qui ont des rayons maximum et moyen similaires Pour les autres critères, l'´ echantillon issu du jeu sans outlier donne des résultats plus faibles que l'´ echantillon issu du jeu avec outliers, Malgré ces valeurs plus faibles, les rapports entre critères sont respectés quelque soit le type de jeu de départ

E. Annexe and E. Tableau, Ensuite, les trois jeux donnent des valeurs similaires pour les crit` eres (cf. Tableau 4.73) Seuls les valeurs maximales diffèrent selon les jeux

D. Weininger, SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, Journal of Chemical Information and Modeling, vol.28, issue.1, pp.31-36, 1988.
DOI : 10.1021/ci00057a005

M. M. Hann and T. I. Oprea, Pursuing the leadlikeness concept in pharmaceutical research, Current Opinion in Chemical Biology, vol.8, issue.3, pp.255-263, 2004.
DOI : 10.1016/j.cbpa.2004.04.003

R. S. Bohacek, C. Martin, and W. C. Guida, The art and practice of structure-based drug design: A molecular modeling perspective, Medicinal Research Reviews, vol.16, issue.1, pp.3-50, 1996.
DOI : 10.1002/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6

P. Ertl, Cheminformatics Analysis of Organic Substituents:??? Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups, Journal of Chemical Information and Computer Sciences, vol.43, issue.2, pp.374-380, 2003.
DOI : 10.1021/ci0255782

T. Scior, P. Bernard, J. L. Medina-franco, and G. M. Maggiora, Large compound databases for structure-activity relationships studies in drug discovery

J. Gasteiger, Handbook of Chemoinformatics : From Data to Knowledge in 4 Volumes, 2003.
DOI : 10.1002/9783527618279

J. J. Irwin and B. K. Shoichet, ZINC ??? A Free Database of Commercially Available Compounds for Virtual Screening, Journal of Chemical Information and Modeling, vol.45, issue.1, pp.177-182, 2005.
DOI : 10.1021/ci049714+

J. Chen, S. J. Swamidass, Y. Dou, J. Bruand, and P. Baldi, ChemDB: a public database of small molecules and related chemoinformatics resources, Bioinformatics, vol.21, issue.22, pp.4133-4139, 2005.
DOI : 10.1093/bioinformatics/bti683

C. P. Austin, L. S. Brady, T. R. Insel, and F. S. Collins, MOLECULAR BIOLOGY: NIH Molecular Libraries Initiative, Science, vol.306, issue.5699, p.1138, 2004.
DOI : 10.1126/science.1105511

I. Tetko, J. Gasteiger, R. Todeschini, A. Mauri, D. Livingstone et al., Virtual Computational Chemistry Laboratory ??? Design and Description, Journal of Computer-Aided Molecular Design, vol.16, issue.40, pp.453-463, 2005.
DOI : 10.1007/s10822-005-8694-y

K. P. Seiler, G. A. George, M. P. Happ, N. E. Bodycombe, H. A. Carrinski et al., ChemBank: a small-molecule screening and cheminformatics resource database, Nucleic Acids Research, vol.36, issue.Database, pp.1-9, 2007.
DOI : 10.1093/nar/gkm843

T. Girke, L. C. Cheng, and N. , ChemMine. A Compound Mining Database for Chemical Genomics, PLANT PHYSIOLOGY, vol.138, issue.2, p.573, 2005.
DOI : 10.1104/pp.105.062687

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1150377

C. Brooksbank, G. Cameron, and J. Thornton, Chebi : Chemical entities of biological interest, Nucl. Ac. Res, vol.33, 2005.

W. Lutz, Current status of virtual combinatorial library design, QSAR Comb. Sci, vol.24, issue.7, pp.809-823, 2005.

D. K. Agrafiotis and E. J. Martin, Advances in combinatorial library design, J. Mol. Graph. Model, vol.18, issue.4, 2000.

P. Sharma, S. Salapaka, and C. Beck, A Scalable Approach to Combinatorial Library Design for Drug Discovery, Journal of Chemical Information and Modeling, vol.48, issue.1, pp.27-41, 2008.
DOI : 10.1021/ci700023y

T. Fink and J. L. Reymond, Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F:?? Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery, Journal of Chemical Information and Modeling, vol.47, issue.2, pp.342-353, 2007.
DOI : 10.1021/ci600423u

L. Arve, T. Voigt, and H. Waldmann, Charting biological and chemical space : Pssc and sconp as guiding principles for the development of compound collections based on natural product scaffolds, QSAR Comb. Sci, vol.25, pp.5-6449, 2006.

Q. Liao, J. Yao, and S. Yuan, SVM approach for predicting LogP, Molecular Diversity, vol.22, issue.3, pp.301-309, 2006.
DOI : 10.1007/s11030-006-9036-2

J. Gola, O. Obrezanova, E. Champness, and M. Segall, ADMET Property Prediction: The State of the Art and Current Challenges, QSAR & Combinatorial Science, vol.13, issue.12, pp.1172-1180, 2006.
DOI : 10.1002/qsar.200610093

C. Steinbeck, Y. Han, S. Kuhn, O. Horlacher, E. Luttmann et al., The Chemistry Development Kit (CDK):??? An Open-Source Java Library for Chemo- and Bioinformatics, Journal of Chemical Information and Computer Sciences, vol.43, issue.2, pp.493-500, 2003.
DOI : 10.1021/ci025584y

R. Todeschini and V. Consonni, Handbook of Molecular Descriptors, 2000.
DOI : 10.1002/9783527613106

J. A. Haigh, B. T. Pickup, J. A. Grant, and A. Nicholls, Small Molecule Shape-Fingerprints, Journal of Chemical Information and Modeling, vol.45, issue.3, pp.673-684, 2005.
DOI : 10.1021/ci049651v

J. Paetz and G. Schneider, Comparison of three holographic fingerprint descriptors and their binary counterparts, MOLPRINT2D, pp.24961-967, 2005.

J. W. Godden, F. L. Stahura, and J. Bajorath, Anatomy of Fingerprint Search Calculations on Structurally Diverse Sets of Active Compounds, Journal of Chemical Information and Modeling, vol.45, issue.6, pp.1812-1819, 2005.
DOI : 10.1021/ci050276w

D. C. Whitley, M. G. Ford, and D. J. Livingstone, Unsupervised Forward Selection:??? A Method for Eliminating Redundant Variables, Journal of Chemical Information and Computer Sciences, vol.40, issue.5, pp.1160-1168, 2000.
DOI : 10.1021/ci000384c

P. Baldi, R. W. Benz, D. S. Hirschberg, and S. J. Swamidass, Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval, Journal of Chemical Information and Modeling, vol.47, issue.6
DOI : 10.1021/ci700200n

E. Gregori-puigjané and J. Mestres, SHED:?? Shannon Entropy Descriptors from Topological Feature Distributions, Journal of Chemical Information and Modeling, vol.46, issue.4, pp.1615-1622, 2006.
DOI : 10.1021/ci0600509

J. Devillers and A. T. Balaban, Topological Indices and Related Descriptors in QSAR and QSPR. Gordon and Breach, The Netherlands, 1999.

D. Bonchev, Information Theoritic Indices for Characterization of Chemical Structures, 1983.

M. Dash and H. Liu, Feature selection for clustering, PAKDD, pp.110-121, 2000.

I. Guyon and A. Elisseeff, An introduction into variable and feature selection, Journal of Machine Learning Research, vol.3, pp.1157-1182, 2003.

E. Amaldi and V. Kann, On the approximation of minimizing non zero variables or unsatisfied relations in linear systems, Theor. Comput. Sci, pp.237-260, 1998.

R. Kohavi and G. John, Wrappers for feature selection, Artif. Intell, pp.273-324, 1997.

J. G. Dy and C. E. Brodley, Feature selection for unsupervised clustering, J. Mach. Learn. Res, pp.845-889, 2004.

C. A. Lipinski, Drug-like properties and the causes of poor solubility and poor permeability, Journal of Pharmacological and Toxicological Methods, vol.44, issue.1, pp.235-249, 2000.
DOI : 10.1016/S1056-8719(00)00107-6

C. A. Lipinski, F. Lombardo, B. W. Dominy, and P. J. Feeney, Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Deliv. Rev, vol.23, pp.1-33, 1997.

W. P. Walters and M. A. Murcko, Prediction of ???drug-likeness???, Advanced Drug Delivery Reviews, vol.54, issue.3, pp.255-271, 2002.
DOI : 10.1016/S0169-409X(02)00003-0

I. Muegge, Selection criteria for drug-like compounds, Medicinal Research Reviews, vol.12, issue.8, pp.302-321, 2003.
DOI : 10.1002/med.10041

M. Vieth, M. G. Siegel, R. E. Higgs, I. A. Watson, D. H. Robertson et al., Characteristic Physical Properties and Structural Fragments of Marketed Oral Drugs, Journal of Medicinal Chemistry, vol.47, issue.1, pp.224-232, 2004.
DOI : 10.1021/jm030267j

C. A. Lipinski, Lead- and drug-like compounds: the rule-of-five revolution, Drug Discovery Today: Technologies, vol.1, issue.4, pp.337-341, 2004.
DOI : 10.1016/j.ddtec.2004.11.007

P. Charifson and W. Walters, Filtering databases and chemical libraries, Journal of Computer-Aided Molecular Design, vol.16, issue.5/6, pp.311-323, 2002.
DOI : 10.1023/A:1020829519597

. Kopple, Molecular properties that influence the oral bioavailability of drug candidates, J. Med. Chem, vol.45, issue.12, pp.2615-2623, 2002.

T. M. Frimurer, R. Bywater, L. Narum, L. N. Lauritsen, and S. Brunak, Improving the Odds in Discriminating ???Drug-like??? from ???Non Drug-like??? Compounds, Journal of Chemical Information and Computer Sciences, vol.40, issue.6
DOI : 10.1021/ci0003810

C. A. Bergstrom, M. Strafford, L. Lazorova, A. Avdeef, K. Luthman et al., Absorption Classification of Oral Drugs Based on Molecular Surface Properties, Journal of Medicinal Chemistry, vol.46, issue.4
DOI : 10.1021/jm020986i

G. Vistoli, A. Pedretti, and B. Testa, Assessing drug-likeness ??? what are we missing?, Drug Discovery Today, vol.13, issue.7-8, pp.285-294, 2008.
DOI : 10.1016/j.drudis.2007.11.007

. Absolv, Sirius analytical instruments ltd

M. P. Gleeson, Generation of a Set of Simple, Interpretable ADMET Rules of Thumb, Journal of Medicinal Chemistry, vol.51, issue.4, pp.817-834, 2008.
DOI : 10.1021/jm701122q

Y. C. Martin, A Bioavailability Score, Journal of Medicinal Chemistry, vol.48, issue.9, pp.3164-3170, 2005.
DOI : 10.1021/jm0492002

A. Monge, A. Arrault, C. Marot, and L. Morin-allory, Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers, Molecular Diversity, vol.14, issue.3
DOI : 10.1007/s11030-006-9033-5

URL : https://hal.archives-ouvertes.fr/hal-00079712

M. C. Hutter, Separating Drugs from Nondrugs:?? A Statistical Approach Using Atom Pair Distributions, Journal of Chemical Information and Modeling, vol.47, issue.1, pp.186-194, 2007.
DOI : 10.1021/ci600329u

G. M. Rishton, Reactive compounds and in vitro false positives in HTS, Drug Discovery Today, vol.2, issue.9, pp.382-384, 1997.
DOI : 10.1016/S1359-6446(97)01083-0

S. L. Mcgovern, E. Caselli, N. Grigorieff, and B. K. Shoichet, A Common Mechanism Underlying Promiscuous Inhibitors from Virtual and High-Throughput Screening, Journal of Medicinal Chemistry, vol.45, issue.8
DOI : 10.1021/jm010533y

J. Seidler, S. L. Mcgovern, T. N. Doman, and B. K. Shoichet, Identification and Prediction of Promiscuous Aggregating Inhibitors among Known Drugs, Journal of Medicinal Chemistry, vol.46, issue.21, pp.4477-4486, 2003.
DOI : 10.1021/jm030191r

C. A. Lipinski and A. Hopkins, Navigating chemical space for biology and medicine, Nature, vol.7, issue.7019, pp.855-861, 2004.
DOI : 10.1016/j.jmb.2003.12.068

A. S. Raghavendra and G. M. Maggiora, Molecular Basis SetsA General Similarity-Based Approach for Representing Chemical Spaces, Chemical space and biology Maggiora and V. Shanmugasundaram. Chemoinformatics chapter Molecular Similarity Measures, pp.1328-1340824, 2004.
DOI : 10.1021/ci600552n

D. M. Maniyar, I. T. Nabney, B. S. Williams, and A. Sewing, Data Visualization during the Early Stages of Drug Discovery, Journal of Chemical Information and Modeling, vol.46, issue.4, pp.1806-1818, 2006.
DOI : 10.1021/ci050471a

C. M. Bishop, Neural Networks for Pattern Recognition, 1995.

J. W. Sammon, A Nonlinear Mapping for Data Structure Analysis, IEEE Transactions on Computers, vol.18, issue.5, pp.401-409, 1969.
DOI : 10.1109/T-C.1969.222678

T. Kohonen, Self-Organizing Maps, 1995.

C. M. Bishop, M. Svensén, and C. K. Williams, GTM: The Generative Topographic Mapping, Neural Computation, vol.39, issue.1, pp.215-234, 1998.
DOI : 10.1007/BF01889678

D. Lowe and M. E. Tipping, Neuroscale : Novel topographic feature extraction with radial basis function networks, Adv. Neural Inf. Proc. Syst, vol.9, pp.543-549, 1997.

T. I. Oprea and J. Gottfries, Chemography:?? The Art of Navigating in Chemical Space, Journal of Combinatorial Chemistry, vol.3, issue.2, pp.157-166, 2001.
DOI : 10.1021/cc0000388

W. S. Torgerson, Theory and Methods of Scaling, 1958.

J. W. Godden and J. Bajorath, A Distance Function for Retrieval of Active Molecules from Complex Chemical Space Representations, Journal of Chemical Information and Modeling, vol.46, issue.3, pp.1094-1097, 2006.
DOI : 10.1021/ci050510i

G. M. Maggiora and M. A. Johnson, Concepts and Applications of Molecular Similarity, 1990.

R. W. Spencer, Diversity Analysis in High Throughput Screening, Journal of Biomolecular Screening, vol.2, issue.2, pp.69-70, 1997.
DOI : 10.1177/108705719700200203

T. Potter and H. Matter, Random or Rational Design? Evaluation of Diverse Compound Subsets from Chemical Structure Databases, Journal of Medicinal Chemistry, vol.41, issue.4, pp.478-488, 1998.
DOI : 10.1021/jm9700878

Y. C. Martin, J. L. Kofron, and L. M. Traphagen, Do Structurally Similar Molecules Have Similar Biological Activity?, Journal of Medicinal Chemistry, vol.45, issue.19, pp.4350-4358, 2002.
DOI : 10.1021/jm020155c

G. Cleuziou, Une méthode de classification non supervisée pour l'apprentissage de r` egles et la recherche d'informations, 2004.

W. M. Rand, Objective Criteria for the Evaluation of Clustering Methods, Journal of the American Statistical Association, vol.15, issue.336, pp.846-850, 1971.
DOI : 10.1080/01621459.1963.10500845

P. Jaccard, Etude comparative de la distribution florale dans une portion des alpes et du jura, Bulletin de la Société Vaudoise des Sciences Naturelles, vol.37, pp.547-579, 1901.

S. H. Cha, S. Choi, and C. C. Tappert, Anomaly between jaccard and tanimoto coefficients, Proceedings of Student-Faculty Research Day, 2009.

A. Lipkus, A proof of the triangle inequality for the tanimoto distance, Journal of Mathematical Chemistry, vol.26, issue.1/3, pp.263-265, 1999.
DOI : 10.1023/A:1019154432472

A. Monge, Création et utilisation de chimiothèques optimisées pour la recherche " in silico " de nouveaux composés bioactifs, 2006.

A. G. Maldonado, J. P. Doucet, M. Petitjean, and B. T. Fan, Molecular similarity and diversity in chemoinformatics: From theory to applications, Molecular Diversity, vol.46, issue.8, pp.39-79, 2006.
DOI : 10.1007/s11030-006-8697-1

P. Willett, J. M. Barnard, and G. M. Downs, Chemical Similarity Searching, Journal of Chemical Information and Computer Sciences, vol.38, issue.6, pp.983-996, 1998.
DOI : 10.1021/ci9800211

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.453.1788

G. W. Bemis and M. A. Murcko, The Properties of Known Drugs. 1. Molecular Frameworks, Journal of Medicinal Chemistry, vol.39, issue.15, pp.2887-2893, 1996.
DOI : 10.1021/jm9602928

G. Bemis and M. Murcko, Properties of Known Drugs. 2. Side Chains, Journal of Medicinal Chemistry, vol.42, issue.25, pp.5095-5099, 1999.
DOI : 10.1021/jm9903996

S. H. Fitzgerald, M. Sabat, and H. M. Geysen, Diversity Space and Its Application to Library Selection and Design, Journal of Chemical Information and Modeling, vol.46, issue.4, pp.1588-1597, 2006.
DOI : 10.1021/ci060066z

J. Batista, J. W. Godden, and J. Bajorath, Assessment of Molecular Similarity from the Analysis of Randomly Generated Structural Fragment Populations, Journal of Chemical Information and Modeling, vol.46, issue.5, pp.1937-1944, 2006.
DOI : 10.1021/ci0601261

M. Rupp, E. Proschak, and G. Schneider, Kernel Approach to Molecular Similarity Based on Iterative Graph Similarity, Journal of Chemical Information and Modeling, vol.47, issue.6, pp.2280-2286, 2007.
DOI : 10.1021/ci700274r

A. Papp, A. Gulyas-forro, Z. Gulyas, G. Dorman, L. Urge et al., Explicit Diversity Index (EDI):?? A Novel Measure for Assessing the Diversity of Compound Databases, Journal of Chemical Information and Modeling, vol.46, issue.5, pp.1898-1904, 2006.
DOI : 10.1021/ci060074f

R. D. Clark and W. J. Langton, -Dissimilarity and Hierarchical Clustering, Journal of Chemical Information and Computer Sciences, vol.38, issue.6, pp.1079-1086, 1998.
DOI : 10.1021/ci980107u

URL : https://hal.archives-ouvertes.fr/hal-00153539

D. M. Bayada, H. Hamersma, and V. J. Van-geerestein, Molecular Diversity and Representativity in Chemical Databases, Journal of Chemical Information and Computer Sciences, vol.39, issue.1, pp.1-10, 1999.
DOI : 10.1021/ci980109e

M. Hassan, J. P. Bielawski, J. C. Hempel, and M. Waldman, Optimization and visualization of molecular diversity of combinatorial libraries, Molecular Diversity, vol.168, issue.1-2, pp.64-74, 1996.
DOI : 10.1007/BF01718702

A. M. Ferguson, D. E. Patterson, C. D. Garn, and T. L. Underiner, Designing Chemical Libraries for Lead Discovery, Journal of Biomolecular Screening, vol.3, issue.2, pp.65-73, 1996.
DOI : 10.1177/108705719600100204

B. R. Stockwell, Exploring biology with small organic molecules, Nature, vol.12, issue.7019, pp.846-854, 2004.
DOI : 10.1016/j.cbpa.2004.04.003

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3165172

C. H. Reynolds, A. Tropsha, B. L. Pfahler, R. Druker, S. Chakravorty et al., Diversity and Coverage of Structural Sublibraries Selected Using the SAGE and SCA Algorithms, Journal of Chemical Information and Computer Sciences, vol.41, issue.6, pp.1470-1477, 2001.
DOI : 10.1021/ci010041u

P. Willett, Chemoinformatics ??? similarity and diversity in chemical libraries, Current Opinion in Biotechnology, vol.11, issue.1, pp.85-88, 2000.
DOI : 10.1016/S0958-1669(99)00059-2

P. H. Sneath and R. R. Sokal, Numerical Taxonomy, 1973.
DOI : 10.1002/9781118960608.bm00018

J. Macqueen, M. Ester, H. P. Kriegel, J. Sander, M. Wimmer et al., Some methods for classification and analysis of multivariate observations Density-connected sets and their application for trend detection in spatial databases, Proc. of the Fifth Berkeley Symp Sec. Inter. Conf. Know. Discov. Data Mining, pp.281-297, 1967.

M. Ankerst, M. Breunig, H. P. Kriegel, and J. Sander, Optics : Ordering points to identify the clustering structure, Proc. Inter. Conf. Manag. Data ACM-SIGMOD, pp.49-60, 1999.

O. Rabal, R. Pascual, J. I. Borrell, and J. Teixido, Cell-Integral-Diversity Criterion:?? A Proposal for Minimizing Cluster Artifact in Cell-Based Selections, Journal of Chemical Information and Modeling, vol.47, issue.5, pp.1886-1896, 2007.
DOI : 10.1021/ci600433c

M. Snarey, N. K. Terrett, P. Willett, and D. J. Wilton, Comparison of algorithms for dissimilarity-based compound selection, Journal of Molecular Graphics and Modelling, vol.15, issue.6, pp.372-385, 1997.
DOI : 10.1016/S1093-3263(98)00008-4

D. K. Agrafiotis and V. S. Lobanov, Trees, Journal of Chemical Information and Computer Sciences, vol.39, issue.1, pp.51-58, 1999.
DOI : 10.1021/ci980100c

S. V. Trepalin, V. A. Gerasimenko, A. V. Kozyukov, N. P. Savchuk, and A. A. Ivaschenko, New Diversity Calculations Algorithms Used for Compound Selection, Journal of Chemical Information and Computer Sciences, vol.42, issue.2
DOI : 10.1021/ci0100649

S. D. Pickett, C. Luttman, V. Guerin, A. Laoui, and E. James, DIVSEL and COMPLIB - Strategies for the Design and Comparison of Combinatorial Libraries using Pharmacophoric Descriptors, Journal of Chemical Information and Computer Sciences, vol.38, issue.2, pp.144-150, 1998.
DOI : 10.1021/ci970060x

J. Mount, J. Ruppert, W. Welch, and A. N. Jain, :?? A Flexible Surface-Based System for Molecular Diversity, Journal of Medicinal Chemistry, vol.42, issue.1, pp.60-66, 1999.
DOI : 10.1021/jm970775r

F. E. Grubbs, Procedures for Detecting Outlying Observations in Samples, Technometrics, vol.6, issue.7, pp.1-21, 1969.
DOI : 10.1080/00401706.1969.10490657

D. Hawkins, Identification of Outliers
DOI : 10.1007/978-94-015-3994-4

R. B. Dean and W. J. Dixon, Simplified Statistics for Small Numbers of Observations, Analytical Chemistry, vol.23, issue.4, pp.636-638, 1951.
DOI : 10.1021/ac60052a025

B. Peirce, Criterion for the rejection of doubtful observations, The Astronomical Journal, vol.2, issue.45, pp.161-163
DOI : 10.1086/100259

P. R. Menard, J. S. Mason, I. Morize, and S. Bauerschmidt, Chemistry Space Metrics in Diversity Analysis, Library Design, and Compound Selection, Journal of Chemical Information and Computer Sciences, vol.38, issue.6, pp.1204-1213, 1998.
DOI : 10.1021/ci9801062

R. Guha, D. Dutta, P. C. Jurs, and C. Ting, -NN Curves:?? An Intuitive Approach to Outlier Detection Using a Distance Based Method, Journal of Chemical Information and Modeling, vol.46, issue.4, pp.1713-1722, 2006.
DOI : 10.1021/ci060013h

URL : https://hal.archives-ouvertes.fr/jpa-00210099

P. R. Menard, R. A. Lewis, and J. S. Mason, Rational Screening Set Design and Compound Selection:??? Cascaded Clustering, Journal of Chemical Information and Computer Sciences, vol.38, issue.3, pp.497-505, 1998.
DOI : 10.1021/ci980003j

Z. Drezner, The p-centre problem-heuristic and optimal algorithms, J. Oper. Res. Soc, vol.35, issue.8, pp.741-748, 1984.

R. Durier, The General One Center Location Problem, Mathematics of Operations Research, vol.20, issue.2, pp.400-414, 1995.
DOI : 10.1287/moor.20.2.400

J. Mihelic and B. Robic, Solving the k-center Problem Efficiently with a Dominating Set Algorithm, Journal of Computing and Information Technology, vol.13, issue.3, pp.225-234, 2005.
DOI : 10.2498/cit.2005.03.05

N. Mladenovic, J. Brimberg, P. Hansen, and J. A. Moreno-pérez, The p-median problem: A survey of metaheuristic approaches, European Journal of Operational Research, vol.179, issue.3, pp.927-939, 2007.
DOI : 10.1016/j.ejor.2005.05.034

M. Badoiu, S. Har-peled, and P. Indyk, Approximate clustering via core-sets, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing , STOC '02, pp.250-257, 2002.
DOI : 10.1145/509907.509947

D. S. Hochbaum and D. B. Shmoys, -Center Problem, Mathematics of Operations Research, vol.10, issue.2, pp.180-184, 1985.
DOI : 10.1287/moor.10.2.180

URL : https://hal.archives-ouvertes.fr/hal-00897097

B. D. Hudson, R. M. Hyde, E. Rahr, J. Wood, and J. Osman, Parameter Based Methods for Compound Selection from Chemical Databases, Quantitative Structure-Activity Relationships, vol.38, issue.4, pp.285-289, 1996.
DOI : 10.1002/qsar.19960150402

B. Gärtner, Fast and Robust Smallest Enclosing Balls, ESA, pp.325-338, 1999.
DOI : 10.1007/3-540-48481-7_29

E. Feldman, F. A. Lehrer, and T. L. Ray, Warehouse Location Under Continuous Economies of Scale, Management Science, vol.12, issue.9, pp.670-684, 1966.
DOI : 10.1287/mnsc.12.9.670

S. Salhi and R. A. Atkinson, Subdrop: A modified drop heuristic for location problems, Location Science, vol.3, issue.4, pp.267-273, 1995.
DOI : 10.1016/0966-8349(96)00003-4

F. E. Maranzana, On the location of supply points to minimize transport costs, Oper. Res. Quarterly, issue.15, 1964.

M. B. Teitz and P. Bart, Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph, Operations Research, vol.16, issue.5, pp.955-961, 1968.
DOI : 10.1287/opre.16.5.955

L. Kaufman and P. J. Rousseeuw, Finding Groups in Data : An Introduction to Cluster Analysis, 1990.
DOI : 10.1002/9780470316801

R. T. Ng and J. Han, Efficient and effective clustering methods for spatial data mining, VLDB'94, Proceedings of 20th International Conference on Very Large Data Bases, pp.144-155, 1994.

N. Mladenovic, M. Labbé, and P. Hansen, -Center problem with Tabu Search and Variable Neighborhood Search, Networks, vol.21, issue.38, pp.48-64, 2003.
DOI : 10.1002/net.10081

URL : https://hal.archives-ouvertes.fr/hal-00979295