What we're about
Upcoming events (2)
Note: This meetup event is being organized as a special joint effort with the NYC Apache Airflow Meetup group: https://www.meetup.com/NYC-Apache-Airflow-Meetup/events/260257700/ Schedule: 6:00 - Doors & Food 6:30 - Talk 1 7:15 - Talk 2 7:45 - Wrap & Chat Talk 1: Data Validation and Alerting. How does Airflow fit in? Abstract: After your ETL runs, a new kind of fun starts. -Is my output data 'right' compared to my 'source of truth'? -Wait a second, how do I even know if my input data was ok? -How do get alerted if a metric violates some threshold/tolerance or if some dimensional data is messed up? -What if I want alerts to be triggered based on dynamic thresholds? -How hard is it to maintain my checks and alerts? Like everyone else, the New York Time's Data Engineers, Data Analysts and Data Scientists have been wrestling with the above questions. This presentation will cover what the Times has tried and the approach that's been settled on (for now). And yes, Airflow plays an important part. Presenters: Brian Lavery, Data Engineer, New York Times Mariam Melikadze, Manager-Advertising Analytics, New York Times Talk 2: Abstract: Apache Airflow is a Python-based task orchestrator that has seen widespread adoption among startups and enterprises alike to author, schedule, and monitor data workflows. By deploying the Airflow stack via Helm on Kubernetes, fresh environments can be easily spun up or down, scaling to near 0 when no jobs are running. As companies scale up their Airflow usage, they need more control, and observability over their stack as it becomes more ingrained into their culture and more important to the business. This talk will go through the technical challenges of supporting thousands of airflow deployments, how to monitor them, reliably push DAG updates, and how to build all the supporting infrastructure of a rock-solid Airflow system in a cloud native environment using open source software. Presenter: Viraj Parekh, Data Engineer, Astronomer Instructions to follow upon arrival: Enter the lobby on the north side of the building. A representative will be waiting next to one of north end elevator turnstiles with a sign that says 'Airflow Meet-Up'. They will assist you in getting through security and send you up to the 15th floor where another representative will be waiting to direct you to the room.
Schedule: 6:00 - Doors & Food 6:30 - Talk 1 7:15 - Talk 2 7:45 - Wrap & Chat Talk 1: An Opinionated Guide to Building an AWS-hosted Data Platform Presenters: Tom LeRoux, VP of Data Engineering and Analytics @ Disney Streaming Abstract: These days there are many ways to build a cloud-based data warehouse. While AWS makes it easier to deploy infrastructure, it does not provide a prescriptive way to build out a data and analytics platform that meets the needs of both data producers and data consumers. In this talk we will dive into particular design biases that helped us choose our data architecture for The Walt Disney Company’s direct-to-consumer video businesses globally, including the ESPN+ premium sports streaming service and Disney+, the upcoming Disney subscription video service. We will dig into the different patterns of streaming and batch data ingestions, and talk about how different types of data is transformed and made available to the organization. Bio: Tom LeRoux is VP of Data Engineering at Disney Streaming Services. Tom joined DSS in July of 2018 and runs the data platform that powers Disney+ and ESPN+. Prior to DSS Tom worked at Goldman Sachs where he led the team that built Goldman's new consumer banking data and analytics platform. Talk 2: TBD