What we're about


DataCouncil.ai - NYC Data Engineering & Science hosts events where lead engineers & data scientists from pre-screened companies present talks on their projects for the benefit of the data community. NYC companies are doing amazing things with data and we've created this event to help them showcase their work and help other engineers level up.

What you'll gain from this group:

* a better understanding of the best OS tools to use in building your own data infrastructure (pipelines, platforms, storage options, etc.)

* real world examples from top technical companies explaining the data platforms their engineers are building

* applied data science talks that explain the minimum level of algorithmic understanding engineers should have to work better with their data science counterparts, or to become a Machine Learning Engineer-unicorn themselves

* highly technical conversations with other attendees, and announcements of the best data events and opportunities in the NYC area

Want to give a talk? Please submit it for consideration here (https://petesoder.typeform.com/to/xpbNUJ).

This group is an independently organized Data Council ( https://www.datacouncil.ai (https://www.datacouncil.ai/) ) group and is sponsored by Data Council.

Other DataCouncil.ai Local Meetups can be found here:


Upcoming events (2)

Data Validation and Alerting. How does Airflow fit in?

New York Times Building

Note: This meetup event is being organized as a special joint effort with the NYC Apache Airflow Meetup group: https://www.meetup.com/NYC-Apache-Airflow-Meetup/events/260257700/ Schedule: 6:00 - Doors & Food 6:30 - Talk 1 7:15 - Talk 2 7:45 - Wrap & Chat Talk 1: Data Validation and Alerting. How does Airflow fit in? Abstract: After your ETL runs, a new kind of fun starts. -Is my output data 'right' compared to my 'source of truth'? -Wait a second, how do I even know if my input data was ok? -How do get alerted if a metric violates some threshold/tolerance or if some dimensional data is messed up? -What if I want alerts to be triggered based on dynamic thresholds? -How hard is it to maintain my checks and alerts? Like everyone else, the New York Time's Data Engineers, Data Analysts and Data Scientists have been wrestling with the above questions. This presentation will cover what the Times has tried and the approach that's been settled on (for now). And yes, Airflow plays an important part. Presenters: Brian Lavery, Data Engineer, New York Times Mariam Melikadze, Manager-Advertising Analytics, New York Times Talk 2: Abstract: Apache Airflow is a Python-based task orchestrator that has seen widespread adoption among startups and enterprises alike to author, schedule, and monitor data workflows. By deploying the Airflow stack via Helm on Kubernetes, fresh environments can be easily spun up or down, scaling to near 0 when no jobs are running. As companies scale up their Airflow usage, they need more control, and observability over their stack as it becomes more ingrained into their culture and more important to the business. This talk will go through the technical challenges of supporting thousands of airflow deployments, how to monitor them, reliably push DAG updates, and how to build all the supporting infrastructure of a rock-solid Airflow system in a cloud native environment using open source software. Presenter: Viraj Parekh, Data Engineer, Astronomer Instructions to follow upon arrival: Enter the lobby on the north side of the building. A representative will be waiting next to one of north end elevator turnstiles with a sign that says 'Airflow Meet-Up'. They will assist you in getting through security and send you up to the 15th floor where another representative will be waiting to direct you to the room.

Building an AWS-hosted Data Platform

Disney Streaming

Schedule: 6:00 - Doors & Food 6:30 - Talk 1 7:15 - Talk 2 7:45 - Wrap & Chat Talk 1: An Opinionated Guide to Building an AWS-hosted Data Platform Presenters: Tom LeRoux, VP of Data Engineering and Analytics @ Disney Streaming Abstract: These days there are many ways to build a cloud-based data warehouse. While AWS makes it easier to deploy infrastructure, it does not provide a prescriptive way to build out a data and analytics platform that meets the needs of both data producers and data consumers. In this talk we will dive into particular design biases that helped us choose our data architecture for The Walt Disney Company’s direct-to-consumer video businesses globally, including the ESPN+ premium sports streaming service and Disney+, the upcoming Disney subscription video service. We will dig into the different patterns of streaming and batch data ingestions, and talk about how different types of data is transformed and made available to the organization. Bio: Tom LeRoux is VP of Data Engineering at Disney Streaming Services. Tom joined DSS in July of 2018 and runs the data platform that powers Disney+ and ESPN+. Prior to DSS Tom worked at Goldman Sachs where he led the team that built Goldman's new consumer banking data and analytics platform. Talk 2: TBD

Past events (50)

Data Council San Francisco 2019

San Francisco


Photos (48)