We will have 2 tech talks at this meetup. We will record the talks and post the video afterwards. Agenda with abstracts below:
6:30 - 7:00 pm Mingling & Refreshments
7:00 - 7:05 pm Introductions
7:05 - 7:40 pm Tech Talk - 1
7:45 - 8:20 pm Tech Talk - 2
8:30 - 8:45 pm Mingling
Tech -Talk-1: Introducing Apache PredictionIO (incubating)
Salesforce has generously donated PredictionIO to the Apache Software Foundation, fostering an even stronger collaboration with the community. Apache PredictionIO (incubating) provides a full stack machine learning environment on top of Apache Spark, making it easy for developers to iterate on production-deployable machine learning engines. In this talk, we will go over the future roadmap of Apache PredictionIO (incubating) and some of its recent development.
Bio: Donald Szeto is a tech lead at Salesforce. He cofounded PredictionIO (now part of Salesforce) in 2012, which had become one of the most popular open source machine learning project.
Tech-Talk-2: TensorFrames: Tensorflow on Spark DataFrames
Since the creation of Apache Spark, I/O throughput has increased at a faster pace than processing speed. In a lot of big data applications, the bottleneck is increasingly the CPU. With the release of Apache Spark 2.0 and Project Tungsten, Spark runs a number of control operations close to the metal. At the same time, there has been a surge of interest in using GPUs (the Graphics Processing Units of video cards) for general purpose applications, and a number of frameworks have been proposed to do numerical computations on GPUs.
In this talk, we will discuss how to combine Apache Spark with TensorFlow, a new framework from Google that provides building blocks for Machine Learning computations on GPUs. Through a binding between Spark and TensorFlow called TensorFrames, distributed numerical transforms on Spark DataFrames and Datasets can be expressed in a high-level language and still rely on highly optimized implementations.
The developers of the TensorFrames package will provide an overview, a live demo on Databricks and a presentation of the future plans. For experts, this talk will also include some technical details on design decisions, the current implementation, and ongoing work on speed and performance optimizations for numerical applications.
Bio: Tim Hunter is a software engineer at Databricks and contributes to the Apache Spark MLlib project. He has been building distributed Machine Learning systems with Spark since version 0.5, before Spark was an Apache Software Foundation project.