Dear Hive members,
We have another great meetup planned for the 28th of June at 6:30 pm.
There will be two three two interesting talks. Facebook is presenting a Hive use-case, MapR will present and demo Apache Drill, and Qubole will present Hive & Hadoop as a service.
De-duplication data on the object graph with Hive
Speaker: Abhishek Doshi, Software Engineer, Facebook
Facebook enables users to easily express connections to objects, i.e the books they read, the movies they watch, the tv shows they like, etc. The objects can originate from Facebook or partner platforms. This talk presents a use-case on how Facebook uses Hive for de-duplication across the object graph, i.e. an open graph object from IMDb that represents the movie Top Gun and another one from Netflix that also represents Top Gun.
Abhi studied Electrical Engineering and Computer Science at UC Berkeley and started at Facebook in August of 2009. He worked on Ads, platform payments, and after moving to London in December of 2012 joined the platform entities team.
[The above talk will be held at another Hive meetup yet to be planned. This is due to some time constraints by the speaker.]
Apache Drill - interactive, ad-hoc query for large-scale datasets
Speaker: Michael Hausenblas, Chief Data Engineer EMEA, MapR Technologies
Apache Drill is a distributed system for interactive analysis of large-scale datasets, inspired by Google’s Dremel technology. It is designed to scale to thousands of servers and able to process Petabytes of data in seconds. Since its inception in mid 2012, Apache Drill has gained widespread interest in the community, attracting hundreds of people. In this talk we discuss how Apache Drill enables ad-hoc interactive query at scale, walking through use cases and review the system architecture. We then focus on Apache Drill's extensibility points, the supported query languages as well as data sources, including a demo of the system.
Michael works at MapR Technologies as Chief Data Engineer EMEA. His background is in large-scale data integration research and development, advocacy and standardisation. He has experience with NoSQL databases and the Hadoop ecosystem. Michael speaks at events, blogs about big data, and writes articles and books on the topic. Michael contributes to Apache Drill, a distributed system for interactive analysis of large-scale datasets.
Cloud Optimized Hadoop and Hive
Speaker: Joydeep Sen Sarma, Qubole
Qubole provides an analytics platform as a service in the Cloud. Hadoop and Hive are one of the core components of our technology stack. In this talk, I will talk about how we conceptualized Hadoop and Hive as a service and the key challenges we have faced in building a multi-tenant implementation in the cloud. I will cover some of the major operational, usability and performance enhancements we have made to Hadoop and Hive so far that have helped us achieve dramatic improvements over prior generation cloud offerings.
Joydeep is co-founder/CTO at Qubole where he's busy building the best analytics platform in the Cloud. He was at Facebook previously where he bootstrapped the Hadoop based analytics stack, started the Apache Hive project and led the Data Infrastructure team. Joydeep was a key contributor on the Facebook Messages architecture team that brought Apache HBase to Facebook. He cut his teeth building data driven applications as the lead engineer on Yahoo'sin-house Recommendation Platform.