Combining path-constrained random walks to recover link weights in heterogeneous information networks

Abstract : Heterogeneous information networks (HIN) are abstract representations of systems composed of multiple types of entities and their relations. Given a pair of nodes in a HIN, this work aims at recovering the exact weight of the incident link to these two nodes, knowing some other links present in the HIN. Actually, this weight is approximated by a linear combination of probabilities, results of path-constrained random walks i.e., random walks where the walker is forced to follow only a specific sequence of node types and edge types which is commonly called a meta path, performed on the HIN. This method is general enough to compute the link weight between any types of nodes. Experiments on Twitter data show the applicability of the method. 1. Introduction. Networked entities are ubiquitous in real-world applications. Examples of such entities are humans in social or communication activities and proteins in biochemical interactions. Heterogeneous information networks (HIN), abstract representations of systems composed of multiple types of entities and their relations, are good candidates to model such entities, together with their relations, since they can effectively fuse a huge quantity of information and contain rich semantics in nodes and links. In the last decade, the heterogeneous information network analysis has attracted a growing interest and many novel data mining tasks have been designed in such networks, such as similarity search, link prediction, clustering and classification just to name a few. The goal of this work is to recover, for a given pair of nodes in a weighted HIN, the actual incident link weight to these two nodes, knowing some other links present in the HIN. Trying to capture not only the presence of a link but also its actual weight can be useful, for instance, in recommendation systems where the weight can be taken for the "rating" a user would give to an item. Another application would be the detection of disease-gene candidate thanks to the prediction of protein-protein interactions. This problem can be related to the node similarity problem since similar nodes tend to be connected. Indeed, the similarity score between two nodes, result of a particular function of these two nodes, can be seen as the strength of their connection and hence, the link weight connecting them. Here, the particular function is related to a random walk on the graph. In HIN, most of similarity scores [6, 9] are based on the concept of meta path, roughly defined as a concatenation of node types linked by corresponding link types. The type of a node/link is basically a label in the abstract representation. Meta paths can be used as a constraint to a classic random walk: the walker is allowed to take only paths satisfying a particular meta path. These path-constrained random walks have the sensitivity to take into account explicitly different semantics present in HIN. Back to our goal, the target weight is approximated by a linear combination of probabilities, results of path-constrained random walks performed on the HIN. The proposed method aims at finding a relevant set of meta paths and the best possible coefficients such that the difference between the exact link weight and its approximation is minimized. The rest of this paper is organized as follows. In Section 2, we remind basic concepts about HIN and present the problem statement. Section 3 explains our method and we apply it on Twitter data related to
Complete list of metadatas

Cited literature [12 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02085410
Contributor : Lionel Tabourier <>
Submitted on : Saturday, March 30, 2019 - 6:16:59 PM
Last modification on : Friday, July 5, 2019 - 3:26:03 PM

File

paper96.pdf
Files produced by the author(s)

Identifiers

Citation

Hông-Lan Botterman, Robin Lamarche-Perrin. Combining path-constrained random walks to recover link weights in heterogeneous information networks. CompleNet 2019 - 10th Conference on Complex Networks, Mar 2019, Tarragona, Spain. pp.97-109, ⟨10.1007/978-3-030-14459-3_8⟩. ⟨hal-02085410⟩

Share

Metrics

Record views

63

Files downloads

94