SINr: Fast Computing of Sparse Interpretable Node Representations is not a Sin!

While graph embedding aims at learning low-dimensional representations of nodes encompassing the graph topology, word embedding focus on learning word vectors that encode semantic properties of the vocabulary. The first finds applications on tasks such as link prediction and node classification while the latter is systematically considered in natural language processing. Most of the time, graph and word embeddings are considered on their own as distinct tasks. However, word co-occurrence matrices, widely used to extract word embeddings, can be seen as graphs. Furthermore, most network embedding techniques rely either on a word embedding methodology (Word2vec) or on matrix factorization, also widely used for word embedding. These methods are usually computationally expensive, parameter dependant and the dimensions of the embedding space are not interpretable. To circumvent these issues, we introduce the Lower Dimension Bipartite Graphs Framework (LDBGF) which takes advantage of the fact that all graphs can be described as bipartite graphs, even in the case of textual data. This underlying bipartite structure may be explicit, like in coauthor networks. However, with LDBGF, we focus on uncovering latent bipartite structures, lying for instance in social or word co-occurrence networks, and especially such structures providing conciser and interpretable representations of the graph at hand. We further propose SINr, an efficient implementation of the LDBGF approach that extracts Sparse Interpretable Node Representations using community structure to approximate the underlying bipartite structure. In the case of graph embedding, our near-linear time method is the fastest of our benchmark, parameter-free and provides state-of-the-art results on the classical link prediction task. We also show that low-dimensional vectors can be derived from SINr using singular value decomposition. In the case of word embedding, our approach proves to be very efficient considering the classical similarity evaluation.

Mots clés

Graph embedding Community detection Word embedding Linear-time algorithm Network science Link prediction

Domaines

Réseaux sociaux et d'information [cs.SI] Apprentissage [cs.LG] Intelligence artificielle [cs.AI] Informatique et langage [cs.CL]

Fichier principal

SINr_fast_computing_of_Sparse_Interpretable_Node_Representations_is_not_a_sin.pdf (506.78 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thibault Prouteau : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03197434

Soumis le : mercredi 14 avril 2021-09:29:13

Dernière modification le : jeudi 1 février 2024-10:04:06

Archivage à long terme le : jeudi 15 juillet 2021-18:09:33

Dates et versions

hal-03197434 , version 1 (14-04-2021)

Identifiants

HAL Id : hal-03197434 , version 1
DOI : 10.1007/978-3-030-74251-5_26

Citer

Thibault Prouteau, Victor Connes, Nicolas Dugué, Anthony Perez, Jean-Charles Lamirel, et al.. SINr: Fast Computing of Sparse Interpretable Node Representations is not a Sin!. Advances in Intelligent Data Analysis XIX, 19th International Symposium on Intelligent Data Analysis, IDA 2021, Apr 2021, Porto, Portugal. pp.325-337, ⟨10.1007/978-3-030-74251-5_26⟩. ⟨hal-03197434⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA UNIV-LEMANS UNIV-ORLEANS EC-NANTES IRISA UNAM UNIV-LORRAINE LORIA LORIA-NLPKD UR1-MATH-STIC LIUM LS2N LS2N-TALN LS2N-DUKE UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE INSA-CVL UR1-MATH-NUM NANTES-UNIVERSITE

420 Consultations

377 Téléchargements