Past Meetup

Building Beyond MVP Data Infrastructure

This Meetup is past

78 people went

Location image of event venue

Details

Talk #1: Architecting Cloud Native Apache Airflow

Apache Airflow is the most popular and effective open-source tool for managing workflows in Python and is used by startups and the Fortune 100 alike. But operationalizing this at scale for a growing team is easier said than done when questions around security, resource monitoring, system tolerance, testing, and deployment still linger.

This talk will cover the associated stack necessary to run Airflow in a cloud native environment. Topics will include orchestration with Kubernetes, logging with Elasticsearch, monitoring with Prometheus and Grafana, service token creation and integration into CI services, and role based authentication.

Presenter: Greg Neiheisel - CTO & Co-Founder, Astronomer
Greg started his career building apps for Great American Insurance Group before leaving to become a partner with Differential Dev Group, helping them to become one of the earliest adopters of Meteor. He left in 2015 to help launch Astronomer, The Airflow Company, and has been CTO ever since. Greg works in a mix of Node, Python, and Go and is an expert in Docker, Kubernetes, and, of course, Airflow.

---

Talk #2: Monitoring the Data Lake: Detecting Problems in Data Pipelines

The fundamental problem solved by the data engineer is to ensure that the data pipeline line is working. They must answer questions like: Are data flows operating normally? Do my data tables contain the correct results? Are data apps able to access the data quickly?

This talk will focus on best practices for monitoring data flowing through a data lake architecture. Topics will include performance monitoring, data quality monitoring, and end-user monitoring. We’ll also cover the metrics you need, and how to acquire those metrics.

Presenter: Paul Lappas - CEO & Co-Founder, intermix.io
Paul is the CEO and Co-Founder of intermix.io. Intermix.io is a single dashboard that lets data engineers monitor their mission critical data flows. Paul hold multiple patents for cloud computing and performance analytics.

-----

Recording: The event will be recorded and distributed afterwards with copies of the slides. Depending on availability, a livestream may be available during the event itself for those registered.

There will be opportunity for up to two lightning talks of 5-10 minutes in length. If interested, please submit your topic to the event organizers.