Past Meetup

Big Data & Cloud Computing - Help, Educate & Demystify.

This Meetup is past

130 people went

Network Meeting Center at Techmart

5201 Great America Pkwy # 122 · Santa Clara, CA

How to find us

Building next to the Hyatt in Santa Clara, CA

Location image of event venue


Meet-up to help, educate and demystify "Big Data” & “Cloud Computing” technologies to businesses, professionals & individuals .


6:00 pm – 7:00 pm :
Registration & Mixer 7:00 pm – 8:00 pm:
Recommendation Engine Powered by Hadoop
- Pranab Ghosh 8:00pm – 9:00 pm:
Building Data Driven Products using Hadoop at LinkedIn
- Mitul Tiwari
Topic Details:

7:00 pm – 8:00 pm:
Recommendation Engine Powered by Hadoop
- Pranab Ghosh

Personalized recommendations are ubiquitous in social network and shopping sites these days. How do they do it? As long as enough user interaction data is available for items e.g., products in shopping sites, a kind of recommendation engine based on what’s known as ' Collaborative Filtering' is not that difficult to build. Since the solution causes a combinatorial explosion, Hadoop can play a critical role in processing massive amount of data in collaborative filtering based solutions. In this presentations, I will cover a Hadoop based recommendation engine implementation using collaborative filtering.

About the presenter:
Pranab Ghosh is a freelance consultant, currently working for Motorola, helping them process ever growing volume mobile device usage data, using Hadoop and other cloud technologies. He has worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. He is an active blogger and open source contributor. His current interests are big data, distributed processing, NOSQL databases and data mining. More about him can be found here

8:00 pm – 9:00 pm:
Building Data Driven Products using Hadoop at LinkedIn
- Mitul Tiwari

Hadoop and other big data tools such as Voldemort, Azkaban, and Kafka, drive many data driven products at LinkedIn such as “People You MayKnow” and various recommendation products such as “Jobs You May Be Interested In”. Each of these products can be viewed as a large scale social recommendation problems, which analyzes billions of possible options, and suggest appropriate recommendation.

Since these products analyzes billions of edges and terabytes of data daily, it can be built only using a large scale distributed compute infrastructure. Kafka publish-subscribe messaging system is used to get the data in Hadoop file system. Hadoop MapReduce is used as the basic building block to analyze billions of potential options, and predict recommendation. Over a hundred MapReduce tasks are combined together in a work-flow uising Azkaban, a Hadoop work-flow management tool. The output of Hadoop jobs is finally stored in Voldemort key-value store to serve the data at run-time for efficiency.

During this talk audience will get a basic understanding of link prediction problem behind “ People You May Know” feature, which is a large scale social recommendation problem. Overview of the solution of this problem using Hadoop MapReduce, Azkaban workflow management tool, and Voldemort key-value store will be presented. I will also describe how to efficiently compute the number of common connections (triangle closing) using Hadoop Mapreduce, which is one of the many signals in link prediction.

Overall, people interested in building interesting applications using Hadoop MapReduce will hugely benefit from this talk.

About the presenter:
Mitul Tiwari is a computer scientist and a software engineer based in Silicon Valley. Currently, he is a part of Search, Network, and Analytics Group at LinkedIn as a Senior Research Engineer. Previously, he worked at Kosmix as a Member of Technical Staff. He completed his PhD in Computer Science from University of Texas at Austin in 2007. Earlier he received his under graduation as a Bachelor of Technology in Computer Science and Engineering from Indian Institute of Technology, Bombay. At LinkedIn, he is working on data driven products such as "People You May Know". His interests include large-scale data mining, distributed systems, network algorithms, and information retrieval.

Wine, beer & light snacks would be served