NOTE: THIS MEETUP IS IN NEW YORK CITY at 6:00pm EST
The Elephant in the Cloud: A Quest for the Next Generation Hadoop Architecture - talk and hands-on Pivotal Hadoop
March 20, 2014 - Pivotal Labs office NYC
6:00-7:00 pm EST gyros and other food and networking!
7:00-8:00 pm EST The Elephant in the Cloud with Roman Shaposhnik
8:00-9:00 pm EST Hands-on Pivotal Hadoop
(Please note all times are in EST even if the settings on this meetup page are not).
The Elephant in the Cloud: A Quest for the Next Generation Hadoop Architecture
with Apache Hadoop contributor, Roman Shaposhnik (7:00-8:00 pm)
In this talk, I will go through the evolution of Hadoop and its ecosystem projects and will try to peer into the crystal ball to predict what may be coming down the pike. I will discuss various way of crunching the data on Hadoop (MapReduce, OpenMPI, Spark and various SQL engines) and how these tools compliment each other.
Apache Hadoop is no longer just a faithful, open source, scalable implementation of two seminal papers that came out of Google 10 years ago. It has evolved into a project that provides the enterprises with a reliable layer for storing massive amounts of unstructured data (HDFS) while allowing different computational frameworks to leverage those datasets.
The original computational framework (MapReduce) has evolved into a much more scalable set of general purpose cluster management APIs collectively known as YARN. With YARN underneath, MapReduce is still there to support batch-oriented computations, but it is no longer the only game in town. With OpenMPI, Spark, and Tez rapidly becoming available now is truly the most exciting time to be a developer in a Hadoop ecosystem. It is also the time when you don't have to be employed by Yahoo!, Facebook or EBay to have access to mind-blowing compute power. That power is a credit card and a pivotal.io account away from anybody on the planet.
I will conclude by outlining some of the ongoing work that makes Hadoop and its ecosystem projects first class citizens in cloud environments based on the work that Pivotal engineers have done with integrating Hadoop into PivotalONE PaaS.
Bio: Roman Shaposhnik is a Sr. Manager Manager of Hadoop Open Source Platform at Pivotal Inc. He is a committer on Apache Hadoop, and holds a Chair position in Apache Bigtop and Apache Incubator projects. Roman has been involved in Open Source software for more than a decade and has hacked projects ranging from Linux kernel to the flagship multimedia library known as FFmpeg. He grew up in Sun microsystems where he had an opportunity to learn from the best software engineers in the industry. Roman's alma mater is St. Petersburg State University, Russia where he studied to be a mathematician.
Hands-on Pivotal Hadoop (8:00-9:00 pm)
Come Test Drive The World's Most Advanced Analytical Platform on The World's Largest Public Hadoop Cluster:
We will introduce the most recent capabilities of Pivotal HD, the HAWQ SQL Query Engine, and GemfireXD in memory capabilities over HDFS. In this hands-on lab Pivotal will be providing one-time access to attendees to test drive the 1000 node Analytics Workbench cluster, the world's largest publicly accessible Hadoop platform. We will walk through sample data sets and analytical query packages and methods and then attendees will be set loose to enjoy our leading technologies on this one-of-a-kind platform.
Network connectivity : Ability to connect to our guest wireless.
Access : Ability to perform ssh.
- On a Mac, this is built in.
- For Windows an ssh app is required. PuTTY is usually a good recommendation