Skip to content

Machine learning pipelines, Spark Packages, and getting to production

Photo of Andy Konwinski
Hosted By
Andy K. and Reynold X.
Machine learning pipelines, Spark Packages, and getting to production

Details

We have 3 talks lined up for this mega-meetup. Videos will be posted online after the meetup.

  • Spark Machine Learning Pipeline API: User and Developer’s Perspective
  • Spark Packages spark-packages.org
  • Getting Spark Customers to Production

There is an additional reception if you are interested: San Jose Ballroom (2nd floor) at the San Jose Marriott (Connected to the San Jose Convention Center), from 5pm to 6pm.

Also Spark Summit is one week later (June 15/16). We have a great set of talks from NASA, CIA, Netflix, Baidu, Airbnb, Microsoft, and more. Use the discount code “SFmeetup” to get 15% off registration. https://spark-summit.org/2015/schedule/

== Talk Descriptions ==

Spark Machine Learning Pipeline API: User and Developer’s Perspective (Ram Sriharsha, Hortonworks; Joseph Bradley, Databricks)

The ML pipeline API in Spark receives a significant boost in the 1.4 release. In this talk, we are going to describe its design and new features in 1.4, including feature extractors and transformers, linear models, decision trees, as well as meta-algorithms and hyper-parameter tuning. Most of the algorithms are available in Python, Java, and Scala. We start from a user's perspective and demo how to use the pipeline API to build and tune your own machine learning pipelines. Then we discuss developer-facing APIs and next steps.

Spark Packages (Burak Yavuz, Databricks)

http://spark-packages.org/

Spark Packages is a package index for users to share and find packages built on top of Spark. Existing packages include new data source support (e.g. CSV/Avro/HBase/Mongo), new machine learning algorithms, and new streaming connectors (Kafka, RabbitMQ).

In this talk, we will walk through some of the existing packages and how users can use these packages. We will also dive into an example of building and publishing a new package to be usable by all Spark users.

Getting Spark Customers to Production (Kostas Sakellis, Cloudera)

In this talk we will cover common challenges faced by Apache Spark users running in production. First we will briefly cover some of the difficulties in getting a Spark proof of concept off the ground. Next, we will discussing getting this Spark job to production. Topics will include:

-Common misconfigurations
-OOM exceptions as you increase your data load
-Security concerns
-Cluster utilization

We will finish the talk by discussing steps we are taking in the community to alleviate some of the challenges and what to expect in the future.

Photo of Bay Area Spark Meetup group
Bay Area Spark Meetup
See more events
150 W San Carlos St · San Jose, CA 95113, CA