Skip to content

Spark + AI Summit: Bay Area Apache Spark Meetup @ Moscone Center, SF

Photo of Jules S. Damji
Hosted By
Jules S. D.
Spark + AI Summit: Bay Area Apache Spark Meetup @  Moscone Center, SF

Details

Moscone Center Room 2014

Join us for an evening of Bay Area Apache Spark Meetup at the Spark + AI Summit (https://databricks.com/sparkaisummit/north-america)featuring tech-talks from Databricks (https://databricks.com/), Uber, (https://www.uber.com/) and Stanford University (https://www.stanford.edu/).

Thanks to Databricks for hosting and sponsoring this meetup.

(Note: This meetup is open to everyone. You don’t have to be registered for Spark + AI Summit.)

Agenda:

6:00 - 6:30 pm Mingling & Refreshments
6:30 - 6:40 pm Opening Remarks & Introductions, Jules Damji, Databricks
6:40 - 7:20 pm Tech Talk-1: Richard Garris, Databricks
7:20 - 8:00 pm Tech Talk-2: Alexander Sergeev, Uber
8:00 - 8:05 pm Short Break
8:05 - 8:45 pm Tech Talk-3: Peter Kraft, Stanford University
8:45 - 9:00 pm More Mingling & Networking

Tech-Talk 1: Understanding Parallelization of Machine Learning Algorithms in Apache Spark™

Abstract: Machine Learning (ML) is a subset of Artificial Intelligence (AI). In this talk, Richard Garris, Principal Architect at Databricks will explain how various ML algorithms are parallelized in Apache Spark. Andrew Ng calls the algorithms the "rocket ship" and the data "the fuel that you feed machine learning" to build deep learning applications. We will start with an understanding of machine learning pipelines built using single machine algorithms including Pandas, scikit-learn, and R. Then we will discuss how Apache Spark MLlib can be used to parallelize your machine learning pipeline with Linear Regression and Random Forest. Lastly, we will discuss ways to parallelize single machine algorithms in Spark by broadcasting the data and then performing distributed feature selection, model creation or hyperparameter tuning.

Bio: Richard Garris is a Principal Solutions Architect at Databricks focused on helping clients with their Advanced Analytics initiatives using Apache Spark and MLlib. He has spent 13 years working with enterprises in data management and analytics. Richard got his undergraduate degree at The Ohio State University and Masters in Software Management from CMU. His previous work experience includes Skytree, Google, and PwC.

Tech-Talk 2: Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow

Abstract: Horovod makes it easy to train single-GPU TensorFlow model on many GPUs - both on a single server and across multiple servers. The talk will touch upon mechanisms of deep learning training, challenges that distributed deep learning poses, mechanics of Horovod, as well as practical steps necessary to train a deep learning model on your favorite cluster.

Bio: Alex Sergeev is a Deep Learning Infrastructure Engineer at Uber working on scalable Deep Learning. He received his MS. degree in Computer Science from Moscow Engineering Physics Institute. Before joining Uber, he was a Senior Software Engineer at Microsoft working on Big Data Mining.

Tech-Talk 3: Apache Spark™ and MacroBase

Abstract: In this talk, we present MacroBase, an analytics system we have built at Stanford University that uses Apache Spark to prioritize human attention via large-scale feature selection. In a world swamped with enormous datasets and an enormous variety of complex tools to analyze them, MacroBase specializes in one task: finding and explaining unusual or interesting trends on data as easily as possible.

Specifically, it searches for correlations in large-scale datasets. For example, an app developer wondering why their app was crashing could ask MacroBase to find factors in their logs that correlate with crash behavior and explain the crashes. Alternatively, an analyst looking for trends in time series data could ask MacroBase to find changes over time. MacroBase relies on Spark and Spark-SQL to provide fast and easy-to-use analytics. Users operate MacroBase using MacroBase-SQL, an extension of SQL that introduces new operators to partition datasets and find explanations on partitions. MacroBase-SQL is built on top of Spark-SQL, with its new operators taking in and returning Spark dataframes. This means that MacroBase is fully distributed and can easily be integrated into any system already running Spark or Spark-SQL.

In this talk, we will explain how we built MacroBase new operators on top of Spark and what you can do with them.

Bio: Peter Kraft is a first-year graduate student at Stanford advised by Peter Bailis and Matei Zaharia. He is interested in solving problems at the intersection of systems and machine learning and in building more usable and powerful machine learning systems.

Photo of Bay Area Spark Meetup group
Bay Area Spark Meetup
See more events
Moscone center, San Francisco, CA
747 Howard Street · San Francisco, CA