Large Scale Graph Processing On HBase and Map/Reduce on Greenplum


Details
Large-Scale Graph Processing Using HBase
Chris McCubbin, TexelTech Inc.
Understanding how nodes interconnect in large graphs is an important problem in many fields. We wish to find connecting nodes between two nodes or two groups of source nodes. In order to find these connecting nodes in huge graphs, we have devised a highly parallelized variant of a k-shortest path algorithm that levies the power of the Hadoop distributed computing system and HBase distributed key/value store. We show how our system enables previously unobtainable graph analysis by finding these connecting nodes in graphs as large as one billion nodes or more on modest commodity hardware in a time frame of just minutes.
Chris McCubbin has a Bachelors degrees in Mathematics and CS and a Masters degree in CS, all from UMBC. He worked for a long time at the Applied Physics Lab on intelligent unmanned vehicle swarms, and recently moved to TexelTek to work on large data analysis including social network analysis and computer network defense. He am the team lead of the Research Team of TexelTek, currently a group of 6 researchers.
Hadoop On GreenPlum
Will Duckworth, ComScore Inc.
Learn how one company has built an environment that supports processing over 500 billion rows of web log data in a syndicated production environment. There will be a focus on the methods used to leverage multiple large scale analytical systems including Hadoop, Hive and PIG.
Will Duckworth, who joined comScore in 2004, manages development and operations teams in support of comScore's efforts to measure the digital world. He has been lead on several projects to develop grid-based decision support systems for comScore and roll out distributed processing in Hadoop and MPP environments.

Large Scale Graph Processing On HBase and Map/Reduce on Greenplum