Mining useful patterns in attributed graphs

Anes Bendimerad

Résumé

We address the problem of pattern discovery in vertex-attributed graphs. This kind of structure consists of a graph augmented with attributes associated to vertices. Vertex-attributed graphs provide a powerful abstraction that can be used to represent many datasets in an intuitive manner. Mining these graphs can be very useful for many applications. When mining vertex-attributed graphs, the principled integration of both graph and attribute data poses two important challenges. First, we need to define a pattern syntax that is intuitive and lends itself to efficient search. Considering that a pattern is generally defined over a subgraph, a pattern can be often huge in terms of vertices, which makes it difficult to grasp. Thus, the assimilation cost of a pattern is an important question that needs to be addressed. The second challenge is the formalization of the pattern interestingness. A pattern is generally relevant if it depicts some local properties that are somehow exceptional, otherwise, it will be already expected from the overall properties of the graph. Furthermore, the interestingness of patterns is subjective in practice, i.e., it significantly depends on the final user, her background knowledge and her preferences. Another common problem related to the interestingness of patterns is the redundancy issue in the result set. In other terms, a data mining approach may return a set of patterns that give redundant information, because these patterns cover very overlapping parts of vertices and attributes. We address these challenges for the problem of mining attributed graphs. We first introduce the task of discovering exceptional attributed subgraphs, which is rooted in Subgroup Discovery. The goal is to identify connected subgraphs whose vertices share characteristics that distinguish them from the rest of the graph. Then, we propose methods that aim to take into account the user and the domain knowledge when assessing the interestingness of patterns. We design a method that makes it possible to incorporate user’s background knowledge and pattern’s assimilation cost. This method is able to identify patterns that are both informative and easy to interpret. Furthermore, we propose another graph mining approach that integrates user’s preferences. This method exploits an interactive process with the user to bias the pattern interestingness. It has been defined for the task of geo-located event detection in social media. Then, we design an approach that is able to incorporate hierarchical attribute dependencies into the pattern interestingness, which allows to avoid redundancy related to this kind of semantic relations between attributes. In other terms, when the attributes are organized as a hierarchy, this method is able to account for the inference that the user would make about some attribute values when she is informed about values of other attributes. Finally, we conclude this thesis by discussing some research perspectives.

Un graphe est une structure qui permet de modéliser efficacement une large variété de données. Par exemple, un réseau social peut être représenté avec un graphe où les personnes sont les sommets, et leurs liens d'amitiés sont les arêtes. Ces graphes peuvent être augmentés avec des attributs décrivant ses sommets. Dans un réseau social, chaque personne peut être décrite par son age, ses centres d'intérêts, etc. C'est ce qu'on appelle un graphe attribué. L'analyse de ce type de graphes peut offrir une grande opportunité pour extraire des informations utiles et actionnables. Cela permet d'identifier des communautés ayant des centres d'intérêts particuliers dans un réseau social, de détecter des évènements à partir des tweets partagés, etc. Dans cette thèse, nous adressons le problème de fouille de graphes attribués. Plus précisément, nous proposons des méthodes qui analysent un graphe pour identifier des motifs : des sous-graphes ayant des caractéristiques particulières. Bien que ce problème a intéressé un grand nombre de chercheurs depuis des années, il reste encore plusieurs défis à relever. Nous adressons les questions : quand est-ce qu'un motif est intéressant pour l'utilisateur ? plusieurs facteurs entrent en jeu. Nous considérons qu'un motif est intéressant : (1) s'il montre une exceptionnalité par rapport au reste du graphe, (2) s'il donne une nouvelle information à l'utilisateur (il est imprévu), (3) s'il communique une information qui fait partie du centre d'intérêt de l'utilisateur (préférences). Pour mesurer la qualité d'un motif, nous proposons des modèles qui prennent en compte un ou plusieurs de ces trois facteur. Nous définissons des algorithmes qui déterminent les meilleurs motifs selon chaque modèle proposé, et nous effectuons des études empiriques pour évaluer l'efficacité de chacun de ces algorithmes.

Mining useful patterns in attributed graphs

Exploration de modèles utiles dans les graphiques attribués

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager