Extending Spark ML for Custom Models (with Holden Karau)

Hosted by Seattle Data Geeks

Public group

This is a past event

51 people went

Details

This meetup is co-hosted with our friends at the Seattle Spark Meetup (https://www.meetup.com/Seattle-Spark-Meetup/). RSVP here or there. You don't need to RSVP with both groups. Special thanks to Blueprint Consulting Services (http://www.bpcs.com/) for hosting this meetup.

Spark Committer/Author and Global Data Geek friend Holden Karau will be passing through town, so we invited her to spend an evening talking Spark and Python.

Abstract

This is an updated version of Holden's Spark Summit West talk with new material including Python support as well as Scala.

Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. This talk introduces Spark’s ML pipelines, and then looks at how to extend them with your own custom algorithms. By integrating your own data preparation and machine learning tools into Spark’s ML pipelines, you will be able to take advantage of useful meta-algorithms, like parameter searching and pipeline persistence (with a bit more work, of course).

Even if you don’t have your own machine learning algorithms that you want to implement, this session will give you an inside look at how the ML APIs are built. It will also help you make even more awesome ML pipelines and customize Spark models for your needs. And if you don’t want to extend Spark ML pipelines with custom algorithms, you’ll still benefit by developing a stronger background for future Spark ML projects.

Agenda

6:30 Meet and Greet / Networking
7:00 Announcements and Featured Talk
8:30 Adjourn for drinks