Skip to content

Practical tips on running Spark on Hadoop & Machine Learning in the Wild

A
Hosted By
Alina P. and Satyendra R.
Practical tips on running Spark on Hadoop & Machine Learning in the Wild

Details

A frequently asked question in the Big Data community is: "Should I go for Hadoop or Spark?". The focus of this meeting is to address this question and provide some insights into where these systems are headed. In addition, we have a talk on Machine Learning.

Talk 1 (30 Minutes)

Practical Tips for successfully running Spark in a Hadoop Cluster

The meeting will start with a talk by Chris Putnam of Cloudera, who will talk about running Spark on Yarn and tips on how to successfully run Spark on a Hadoop Cluster. His presentation takes a high level look at Spark from a Administrator’s perspective and provides key concepts and tips to support Spark users on the cluster.

His talk will be followed by a talk on Machine Learning by Chuck Anderson.

Talk 2 (40 Minutes)

Machine learning in the wild: a cautionary tale

This talk compares logistic regression performance of 5 different open source packages, primarily focusing on Spark. We cover specific cases, applying Spark 1.4 binary classification logisticSGD to different example datasets. We examine these results to illustrate pitfalls that can arise, how to be mindful of them, and ways to avoid them. There will be a live demonstration running Spark and processing the example data.

About Speakers:

Chris Putnam is a Systems Engineer with Cloudera. He has over 10 years of experience in operational analytics focused in the areas of Customer Analytics and Analytics for IT Operations. He is currently working to help demystify Hadoop and Big Data for organizations and help their IT teams operationalize their big data applications and systems.

Chuck Anderson -- Chuck's career has spanned research, engineering, and product development in pure physical sciences, nonlinear laser spectroscopy, applied sensors, medical device instrumentation and predictive analytics. For as long as he can recall Chuck has enjoyed building and breaking models. He currently uses predictive analytics, applied mathematics, data science, and other tools to build models for UFALLC, based in downtown Ann Arbor.

Many thanks to Cloudera and Masco Cabinetry for co-sponsoring this event.

Photo of Michigan Spark Users Group group
Michigan Spark Users Group
See more events
Masco Cabinetry
4600 Arrowhead Drive · Ann Arbor, MI