For this month's session, we have been invited by Capital One to meet at their West Creek campus. I'm excited to announce we will be joined by Tim Hunter and Matt Der -- both PhDs in machine learning -- who will share some perspectives on machine learning in Apache Spark.
We apologize for the short notice, but we hope you will be able to join us!
• 6:00 - 6:25 PM -- Food, Drinks, and Networking
• 6:25 - 6:30 PM -- Word from the Organizers
• 6:30 - 7:00 PM -- Tim Hunter, PhD (Databricks): Tuning and Monitoring Deep Learning on Apache Spark
• 7:00 - 7:05 PM -- Break
• 7:05 - 8:00 PM -- Matt Der, PhD (Notch): Large-Scale Price Optimization with Spark & Machine Learning
From 288 South, take the Capital One Drive exit.
From 288 North, take the West Creek Parkway West exit and then your first left onto Capital One Drive.
Follow the signs to the "Commons," which will take you past Central Parking. Park in the Central Parking deck.
When you exit the deck, you'll be facing two buildings with a concrete ramp between them. Walk up the ramp and enter "The Commons" on your left through the large glass doors.
After checking-in with security (bring an ID), pass through the vestibule and turn left. There will be a video monitor there showing the room assignment for the meeting.
Tuning and Monitoring Deep Learning on Apache Spark
Deep Learning on Apache Spark has the potential for huge impact in research and industry. This talk will describe best practices for building deep learning pipelines with Spark.
Rather than comparing deep learning systems or specific optimizations, this talk will focus on issues that are common to many deep learning frameworks when running on a Spark cluster: optimizing cluster setup and data ingest, tuning the cluster, and monitoring long-running jobs. We will demonstrate the techniques we cover using Google’s popular TensorFlow library.
More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters. Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput. Interactive monitoring facilitates both the work of configuration and checking the stability of deep learning jobs.
Tim Hunter is a software engineer at Databricks and contributes to the Apache Spark MLlib project. He has been building distributed Machine Learning systems with Spark since version 0.2, before Spark was an Apache Software Foundation project.
Large-Scale Price Optimization with Spark & Machine Learning
Apache Spark is a leading technology for providing data science solutions to big data problems. Notch CTO Matt Der will present a real-world use case from a recently completed engagement with EnterBridge Technologies -- a partner in the Richmond tech community. EnterBridge performs wholesale pricing analysis for distributors and manufacturers across a range of industries. Business analysts currently use a rule-based approach with manual feature selection and filters to determine "price groups" that indicate which products can be sold at higher margins. Notch framed this task as a clustering problem, introducing a machine learning approach that is automated, well-principled, and highly customizable, which offers EnterBridge opportunities to inject their domain expertise. This talk will walk through the deliverable, a Databricks Notebook, covering both high-level machine learning concepts and low-level implementation details in Spark.
Matt Der graduated from UC San Diego with his PhD in Machine Learning in 2015. While in California, he also worked at Google San Francisco with the Internal Privacy team. Upon the completion of his degree, Matt moved back to Richmond, VA to take the Chief Technology Officer position at Notch to advance the company's capability in data science and machine learning.