Distributed Large-scale Natural Graph Factorization - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Distributed Large-scale Natural Graph Factorization

Résumé

Natural graphs, such as social networks, email graphs, or instant messaging patterns, have become pervasive through the internet. These graphs are massive, often containing hundreds of millions of nodes and billions of edges. While some theoretical models have been proposed to study such graphs, their analysis is still difficult due to the scale and nature of the data. We propose a framework for large-scale graph decomposition and inference. To resolve the scale, our framework is distributed so that the data are partitioned over a shared-nothing set of machines. We propose a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighboring vertices rather than edges across partitions. Our decomposition is based on a streaming algorithm. It is network-aware as it adapts to the network topology of the underlying computational hardware. We use local copies of the variables and an efficient asynchronous communication protocol to synchronize the replicated values in order to perform most of the computation without having to incur the cost of network communication. On a graph of 200 million vertices and 10 billion edges, derived from an email communication network, our algorithm retains convergence properties while allowing for almost linear scalability in the number of computers.

Dates et versions

hal-00918478 , version 1 (13-12-2013)

Identifiants

Citer

Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, Alexander J. Smola. Distributed Large-scale Natural Graph Factorization. IW3C2 - International World Wide Web Conference, May 2013, Rio de Janeiro, Brazil. pp.37, ⟨10.1145/2488388.2488393⟩. ⟨hal-00918478⟩
1429 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More