The title is in multiple senses. We will be convening at Lyft in SF. We will also hear about how Beam is being used, and the efforts towards portability (esp. with Python). The precise schedule, talks, to be worked out soon. Details will be updated here.
6:00 - 6:30 pm: Check in, food, networking
6:30 - 6:35 pm: Intros
6:35 - 7:10 pm - Talk #1
7:15 - 7:50 pm - Talk #2
7:55 - 8:30 pm - Talk #3
8:30 - 8:45 pm - Wrap up
Talk #1: Overview of Apache Beam and TensorFlow Transform (TFX) with Apache Beam!
Apache Beam is set of portable SDKs (Java, Python, Go) for constructing streaming and batch data processing pipelines that can be written once and executed on any supported runtime. Tyler will give an overview of the project, with a focus on the current community efforts towards completing the vision laid out in when the project was founded: providing full cross-language portability across supported execution engines.
Learn how the TensorFlow Extended (TFX) project is utilizing Apache Beam to simplify pre- and post-processing for ML pipelines. TFX provides a framework for managing all of necessary pieces of a real-world machine learning project beyond simply training and utilizing models. Robert will provide an overview of TFX, and talk in a little more detail about the pieces of the framework (tf.Transform and tf.ModelAnalysis) which are powered by Apache Beam.
Tyler Akidau is a staff software engineer at Google Seattle. He leads technical infrastructure’s internal data processing teams (MillWheel & Flume), is a founding member of the Apache Beam PMC, and has spent the last seven years working on massive-scale data processing systems. He is the author of the 2015 Dataflow Model paper and the Streaming 101 and Streaming 102 articles on the O’Reilly website. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
Robert Crowe is a Developer Advocate at Google focused on TensorFlow.
Talk #2: Python Streaming Pipelines with Beam on Flink
Apache Beam established a unified programming model for data processing. Its vision bridges not only the gap between batch and streaming, but also across languages (Java, Python, Go, ..) and various execution engines. To accomplish all of this, Beam requires cross-language support.
Over the past year, the Beam community has leaped closer towards realizing the cross-language support and enabled processing pipelines written in Python to run on Apache Flink, with more of the Beam runners to follow. Lyft has contributed to this effort and is about to launch the first production Python streaming use case on this new stack.
Python running natively with all its libraries and with the JVM based Flink runner? Let’s take a look at how the magic works and what comes next!
Thomas Weise is Software Engineer, Streaming Platform at Lyft. He is Apache Flink committer, PMC member of Apache Apex and Apache Beam and has contributed to several more of the ecosystem projects.
Talk #3: Dynamic pricing of Lyft rides using streaming
At the core of Lyft is how we dynamically price our rides - a combination of various data sources, ML models, and streaming infrastructure for low latency, reliability and scalability. This brief talk will cover the infrastructure Lyft uses to calculate primetime, why we were motivated to transition to a streaming architecture using Beam, what our new pipeline looks like, and what we've learned along the way.
Amar Pai is a software engineer on the Lyft pricing team. He's spent the last two decades doing software development in Silicon Valley, including 3 years at Lyft and earlier stints at Goodreads (startup acquired by Amazon), Google, Youtube, and EA.