Past Meetup

Big Data Science @ Strata + Hadoop World

This Meetup is past

119 people went


7:00 P.M. - 7:15 P.M. Networking / Introduction

7:15 P.M. - 7:45 P.M. Session 1

Title: The Bottom Line in Big Data

Speaker: Ashish Dubey, Sr. Director and Solutions Architect

Abstract: Big data engines like Hadoop and Spark are known to work well when running on homogeneous clusters. This allows the underlying resource manager to optimally place tasks on the nodes and also lets users tune their jobs as per the configuration of a single machine type. While this setup has performance benefits, in many use cases it can also lead to higher total cost of ownership (TCO) because of the inability to pick and choose instances (for a given cluster) based on the workload and the availability/cost of different instance types.

Today we will discuss the availability of heterogeneous clusters on Qubole Data Service (QDS). You can mix-and-match EC2 instances of different types within the same cluster--leading to significant cost savings and more reliable clusters. These instances can come from different instance families as well as different purchasing options like On-Demand and Spot. Furthermore, with Qubole's Smart Auto Scaling (which precisely matches cluster utilization to the workloads so there are no wasted compute resources) we've observed cost savings of over $300K per year for just one cluster.

Speaker Bios: Ashish Dubey is a Solutions Architect at Qubole with about 13 years of industry experience in various technology domains. Prior to Qubole Ashish has spent four years at Microsoft, contributed on Windows XP OS development. Later he worked for Teradata's consulting division and built several large scale BI/Big Data systems for some of the Fortune 500 clients in different industry verticals like finance, healthcare, retails and multimedia. For last 3 years Ashish has been helping Qubole customers, building large scale data solutions using technologies like Spark, Hadoop, Presto etc.

7:45 P.M. - 7:55 P.M. Q/A

8:00 P.M. - 8:30 P.M. Session 2

Title: Data Science, Machine Learning, and Hadoop

Speaker: Sean Anderson - Senior Manager, Data Science and Engineering, Cloudera and Jordan Volz, Systems Engineer at Cloudera

Abstract: Apache Hadoop and Apache Spark allow data scientists to use massive amounts of new data and compute to deliver better machine learning models faster. But in practice, most data science still runs on isolated environments with poor access to Hadoop data. Modern Data Science needs to use popular R and Python packages and have the freedom to customize environments.

In this session, we’ll explore the common, specific, real-world technical challenges facing both audiences. We’ll also discuss relevant improvements coming to the Hadoop ecosystem, best practices for configuring a data science environment, and introduce new tools designed to make self-service data science a reality.

Speaker Bio: Sean is a tenured infrastructure scaling and cloud strategy consultant with a strong focus on strategic partnerships and innovative hybrid technology. He has been a part of integral shifts in technology including the rise of cloud computing, open source standardization, and big data. Sean quickly became a go-to resource and speaker for data specific workloads focusing on technologies like Hadoop, MongoDB, Redis, Elasticsearch, SQL, and Data Warehousing. At Rackspace Hosting, Sean helped build and launch open-source cloud platforms around Hadoop, MongoDB, Elasticsearch and Redis. Sean is currently a manager for IT Solutions at Cloudera; the pioneers of Apache Hadoop.

Jordan Volz is a Systems Engineer at Cloudera. He helps clients design and implement big data solutions using Cloudera’s Distribution of Hadoop, across a variety of industry verticals. Previously, he has worked as a consultant for HP Autonomy delivering compliance archiving, e-Discovery, and electronic surveillance solutions to regulated financial services companies, and as a developer at Epic Systems building HIPPA-compliant EMR software.

8:30 P.M. - 8:45 P.M. Q/A

8:45 P.M. - 9:00 P.M. Networking

As this Meetup is happening at Strata + Hadoop World, all attendees must observe O'Reilly's Code of Conduct (CoC)::

We appreciate your part in making this community event a safe and productive environment for everyone.

This event is Sponsored By: