Statistical learning algorithms for geometric and topological data analysis

Thomas Bonis

Thèse Année : 2016

Statistical learning algorithms for geometric and topological data analysis

Algorithmes d'apprentissage statistique pour l'analyse géométrique et topologique de données

(1)

Thomas Bonis

Fonction : Auteur

Inria Saclay - Ile de France

Résumé

In this thesis, we study data analysis algorithms using random walks on neighborhood graphs, or random geometric graphs. It is known random walks on such graphs approximate continuous objects called diffusion processes. In the first part of this thesis, we use this approximation result to propose a new soft clustering algorithm based on the mode seeking framework. For our algorithm, we want to define clusters using the properties of a diffusion process. Since we do not have access to this continuous process, our algorithm uses a random walk on a random geometric graph instead. After proving the consistency of our algorithm, we evaluate its efficiency on both real and synthetic data. We then deal tackle the issue of the convergence of invariant measures of random walks on random geometric graphs. As these random walks converge to a diffusion process, we can expect their invariant measures to converge to the invariant measure of this diffusion process. Using an approach based on Stein's method, we manage to obtain quantitfy this convergence. Moreover, the method we use is more general and can be used to obtain other results such as convergence rates for the Central Limit Theorem. In the last part of this thesis, we use the concept of persistent homology, a concept of algebraic topology, to improve the pooling step of the bag-of-words approach for 3D shapes.

Dans cette thèse, on s'intéresse à des algorithmes d'analyse de données utilisant des marches aléatoires sur des graphes de voisinage, ou graphes géométriques aléatoires, construits à partir des données. On sait que les marches aléatoires sur ces graphes sont des approximations d'objets continus appelés processus de diffusion. Dans un premier temps, nous utilisons ce résultat pour proposer un nouvel algorithme de partitionnement de données flou de type recherche de modes. Dans cet algorithme, on définit les paquets en utilisant les propriétés d'un certain processus de diffusion que l'on approche par une marche aléatoire sur un graphe de voisinage. Après avoir prouvé la convergence de notre algorithme, nous étudions ses performances empiriques sur plusieurs jeux de données. Nous nous intéressons ensuite à la convergence des mesures stationnaires des marches aléatoires sur des graphes géométriques aléatoires vers la mesure stationnaire du processus de diffusion limite. En utilisant une approche basée sur la méthode de Stein, nous arrivons à quantifier cette convergence. Notre résultat s'applique en fait dans un cadre plus général que les marches aléatoires sur les graphes de voisinage et nous l'utilisons pour prouver d'autres résultats : par exemple, nous arrivons à obtenir des vitesses de convergence pour le théorème central limite. Dans la dernière partie de cette thèse, nous utilisons un concept de topologie algébrique appelé homologie persistante afin d'améliorer l'étape de "pooling" dans l'approche "sac-de-mots" pour la reconnaissance de formes 3D.

Mots clés

Soft clustering Bag-of-words Persistent homology Stein's method Random geometric graphs Random walks

Sac-de-mots Homologie persistante Graphes géométriques aléatoires Marches aléatoires Partitionnement de données flou Méthode de Stein

Domaines

Probabilités [math.PR] Statistiques [math.ST] Machine Learning [stat.ML]

Fichier principal

73822_BONIS_2016_diffusion.pdf (2.15 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://hal.science/tel-01402801

Soumis le : lundi 13 février 2017-17:01:39

Dernière modification le : lundi 22 avril 2024-13:20:00

Archivage à long terme le : dimanche 14 mai 2017-16:43:02

Dates et versions

tel-01402801 , version 1 (25-11-2016)

tel-01402801 , version 2 (28-11-2016)

tel-01402801 , version 3 (13-02-2017)

Identifiants

HAL Id : tel-01402801 , version 3

Citer

Thomas Bonis. Statistical learning algorithms for geometric and topological data analysis. Probability [math.PR]. Université Paris-Saclay, 2016. English. ⟨NNT : 2016SACLS459⟩. ⟨tel-01402801v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA STAR INRIA2 UNIV-PARIS-SACLAY GS-COMPUTER-SCIENCE

640 Consultations

488 Téléchargements

Statistical learning algorithms for geometric and topological data analysis

Algorithmes d'apprentissage statistique pour l'analyse géométrique et topologique de données

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager