Skip to content

Airflow, Streaming and more

Photo of Chester Chen
Hosted By
Chester C.
Airflow, Streaming and more

Details

Thanks for Mark Grover from Lyft Help organizing this event. In this event, we have invited two speakers Max and Gwen to discuss two separate topics of Data Engineering.

Important Note: It is required to register for the event (free) on ti.to(https://ti.to/big-data/airflow-and-big-data), 48 hours before the event. You will then be sent an eNDA which needs to be signed 24 hours before the event, for security reasons. Signing the eNDA means that a badge would be pre-printed and waiting for you when you arrive at the event. Please register here (https://ti.to/big-data/airflow-and-big-data)

Talk #1: Advanced Data Engineering Patterns with Apache Airflow

Analysis automation and analytic services are the future of data engineering! Apache Airflow's DSL makes it natural to build complex DAGs of tasks dynamically, and many organizations have been leveraging this feature in intricate ways, creating a wide array of services as dynamic workflows. In this talk, we'll explain the mechanics of dynamic pipeline generation using Apache Airflow, and present advanced use cases.

Speaker : Maxime Beauchemin

Maxime works at Lyft as part of the "Data Platform" team, developing open source products that reduce friction and help generating insight from data. He is the creator and a lead maintainer of Apache Airflow [incubating] (a workflow engine), Superset (a data visualization platform), and recognized as a thought leader in the data engineering field. Before Lyft, Maxime worked at Airbnb on open source data tools, Facebook on computation frameworks powering engagement and growth analytics, on clickstream analytics at Yahoo!, and as a data warehouse architect at Ubisoft.

Talk #2: Stream All Things - Patterns of Modern Data Integration

80% of the time in every project is spent on data integration: Getting the data you want the way you want it. This problem remains challenging despite 40 years of attempts to solve it. We want a reliable, low latency system that can handle varied data from wide range of data management systems. We want a solution that is easy to manage and easy to scale. Is it too much to ask?

In this presentation, we’ll discuss the basic challenges of data integration and introduce design and architecture patterns that are used to tackle these challenges. We will explore how these patterns can be implemented using Apache Kafka and share pragmatic solutions that many engineering organizations used to build fast, scalable and manageable data pipelines.

Speaker: Gwen Shapira

Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data-processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.

Agenda:
6 - 6:30 pm: Check in and settle, networking
6:30 - 6:35: Intros
6:35 - 7:20 pm - Talk #1 (45 mins)
7:25 - 8:10 pm - Talk #2 (45 mins)
8:10 - 8:30 pm - Networking and wrap up

Photo of SF Big Analytics group
SF Big Analytics
See more events
Lyft HQ
185 Berry Street · San Francisco, CA