GraphLab and GraphBuilder

Posted on April 5, 2013 by ivotron

In 2010, Google published Pregel [1], a BSP-based [2] graph processing engine, also referred to as a vertex-centric approach (as opposed to edge-centric). Giraph is a library that mimics Pregel by running on top of Hadoop MapReduce.

An alternative to Giraph is GraphLab [3], another vertex-centric implementation part of the Post-MapReduce era. Unlike Giraph, GraphLab has its own execution engine and operates asynchronously on top of HDFS.

GraphBuilder [4], as the name implies, is a set of MapReduce tasks that extract, normalize, partition and serialize (among other things) a graph out of unstructured data, and writes graph-specific formats into HDFS. It is designed to produce the input to batch-oriented graph-processing frameworks such as GraphLab.

The GraphBuilder/GraphLab architecture.

The GraphBuilder/GraphLab architecture.

References

[1] G. Malewicz, M.H. Austern, A.J. Bik, J.C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for large-scale graph processing,” Proceedings of the 2010 international conference on management of data, New York, NY, USA: ACM, 2010, pp. 135–146.

[2] L.G. Valiant, “A bridging model for parallel computation,” Commun. ACM, vol. 33, Aug. 1990, pp. 103–111.

[3] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J.M. Hellerstein, “Distributed GraphLab: A framework for machine learning and data mining in the cloud,” Proc. VLDB Endow., vol. 5, Apr. 2012, pp. 716–727.

[4] T.L. Willke, N. Jain, and H. Gu, “GraphBuilder–A scalable graph construction library for apache™ hadoop™,” 2012.