What we're about
Upcoming events (2)
Schedule: 6:00 - Doors & Food 6:30 - Talk 1 7:15 - Talk 2 7:45 - Wrap & Chat Talk 1: Pros and cons of containerizing data workflows (and how to have the best of both worlds) Speaker: Tian Xie, Data Engineer @ Enigma Abstract: At Enigma, we run over one hundred workflows to ingest public data into our system. Running so many workflows also means managing dependencies and deployment for each of those workflows. Over time, we have iterated over several solutions to this problem and this is our story. Spoiler: docker is involved, but (*plot twist*) it only leads to another set of problems in the 2nd act. Bio: Tian Xie has been working in the NYC tech start-up scene for the last eight years on consumer video rendering, on-demand shipping, and now data engineering at Enigma Technologies. Talk 2: Building a Data Pipeline with Testing in Mind Speaker: Jiaqi Liu, Software Engineer @ Button, Inc Abstract: It’s one thing to build a robust data pipeline process in python but a whole other challenge to find tooling and build out the framework that allows for testing a data process. In order to truly iterate and develop a codebase, one has to be able to confidently test during the development process and monitor the production system. In this talk, I hope to address the key components for building out end to end testing for data pipelines by borrowing concepts from how we test python web services. Just like how we want to check for healthy status codes from our API responses, we want to be able to check that a pipeline is working as expected given the correct inputs. We’ll talk about key features that allows a data pipeline to be easily testable and how to identify timeseries metrics that can be used to monitor the health of a data pipeline.
Data Council (https://www.datacouncil.ai) is coming to San Francisco, will you join us? The main event was born out of a similar meetup group to this, and we're excited to have become a cornerstone of the growing data community on meetup. What you will get out of Data Council SF 2019 (https://www.datacouncil.ai/san-francisco-2019): - 2 days & 50+ insightful talks by leading data scientists and engineers from top companies like Facebook, Salesforce, IBM, Netflix, Google, WeWork, Lyft, Stitch Fix, Datadog, Segment, Datacoral, Stanford University and many more. - 6 unique tracks: Data Platforms & Pipelines, Databases & Tools, Data Analytics, Machine & Deep Learning, and our all-new tracks: Hero Engineering and AI Products. - All-new content including our brand new Founders Panel of top founders in the data space. - Extensive networking opportunities at the conference, or connect with speakers & attendees at our Wednesday night after-party between conference days. - Small group Speaker Office Hours following each talk with an opportunity to dive deeper into the subject matter 1:1 with the speaker. - Attendees that are highly-technical data scientists, engineers, analysts & technical founders from top tech, media, and finance companies around the SF area. - Connect with our great partner companies at Sponsor Spotlight to discover their available data jobs and latest product developments. This year Data Council San Francisco ‘19 takes place on April 17 & 18th. As members of this meetup group and our community I wanted to extend you a sweet deal to get tickets for $100 lower than our lowest early bird pricing. To redeem go here: https://www.datacouncil.ai/san-francisco-2019 using coupon code: 100offeb to redeem your $100 discount. Why should you join this year?, If you believe in Quality Content > $, and would like to learn from companies like Facebook, Apache Foundation, Google, Netflix, Salesforce, Spotify, WeWork, Beeswax, Stitch Data, Capital One, Airbnb, Datadog, Lyft, Segment, Starburst, Datacoral, Columbia University, Uber, TapRecruit, Figure Eight, Dia&Co and many more along with many awesome speakers, You should join! Cheers, -Pete