Skip to content

lakeFS đź’› DuckDB: Why is DuckDB all the rage in the Data Community?

Photo of Ankit Srinivas
Hosted By
Ankit S.
lakeFS đź’› DuckDB: Why is DuckDB all the rage in the Data Community?

Details

DuckDB is coming to SF! Join us as we explore the possibilities of how lakeFS and DuckDB can improve your database management!

--------
🤝Organizer : DuckDB & lakeFS
📍Location: Trellis Co-working & Events (981 Mission St, San Francisco)
🍕Catering: Pizza & Drinks

👩🏻‍💻👨🏾‍💻 Who is it for:
This meetup is open to all folks in the data community who want to learn and grow their Data Engineering & Data Science skills. Students, Professionals, and Careers changers are all welcome to join and learn about the best strategies & workflows in the data community.

Agenda

  • 5:00pm - 5:30pm: Welcome/Networking
  • 5:30pm - 6:00pm: Quacking DuckDB - local and in the cloud
  • 6:00pm - 6:30pm: ML experimentation made easy with lakeFS & DuckDB
  • 6:30pm - 7:00pm: Networking

--------
Session 1: Quacking DuckDB - local and in the cloud
Ryan is at MotherDuck, where they’re working to make analytics ducking awesome by offering a serverless DuckDB. In this talk, Ryan will give an introduction to the open source DuckDB project, talk about how it’s used and some of the attributes which have made it take the internet by storm. You’ll see code and CLI, SQL and Python. He’ll also talk about some of the philosophies and beliefs which led to the creation of MotherDuck as a VC-backed startup and their partnership with the creators of DuckDB.
Big data is dead. Long live easy data.

Speaker: Ryan Boyd: Cofounder @Motherduck
Ryan Boyd is a Boulder-based software engineer, data + authNZ geek and technology executive. He's currently a co-founder at MotherDuck, where they're making data analytics fun, frictionless and ducking awesome. He previously led developer relations teams at Databricks, Neo4j and Google Cloud. He's the author of O'Reilly's Getting Started with OAuth 2.0.Ryan advises B2B SaaS startups on growth marketing and developer relations as a Partner at Hypergrowth Partners. Prior to leading the Google Cloud Developer Relations team, he spent 7 years at Google working on 20+ different developer products and was the co-founder of Google Code Labs which aimed to improve quality and stability of Google's developer products.Ryan graduated with a degree in Computer Science from Rochester Institute of Technology (RIT) where he later worked full-time building web applications + APIs and architecting the central web hosting platform.

Session 2: ML experimentation made easy with lakeFS & DuckDB
Machine learning workflows are not linear, where experimentation is an iterative & repetitive to and fro process between different components. What this often involves is experimentation with different data labeling techniques, data cleaning, preprocessing and feature selection methods during model training, just to arrive at an accurate model.

Data exploration, cleaning and preprocessing training data is a significant process in getting ML right. However, exploring data is HARD! In this talk I’ll share how open source lakeFS embedded DuckDB to enable just this kind of experience, natively from within the lakeFS UI. By leveraging DuckDB, data practitioners achieve simple, performant ways to explore & clean data using SQL without having to run expensive and complex distributed systems, all within their same workflow and experience.

Quality ML at scale is only possible when we can reproduce a specific iteration of the ML experiment–and this is where data is key. To efficiently version ML experiments without duplicating code, data and models, data versioning tools are critical. Open source tools like lakeFS make it possible to version all components of ML experiments without the need to keep multiple copies, and as an added benefit, save you storage costs as well.

This talk will demo through a live code example:

  • Creating a basic ML experimentation framework with lakeFS (on Jupyter notebook)
  • Reproducing ML components from a specific iteration of an experiment
  • Building intuitive, zero-maintenance experiments infrastructure

Speaker: Vino Duraisamy, Developer Advocate @lakeFS
Vino is a developer advocate at lakeFS, an open-source platform that delivers git-like experience to object store based data lakes. She started as a software engineer at NetApp, and then hopped onto cloud and big data world and landed at the data teams of Nike and Apple. There she worked mainly on batch processing workloads as a data engineer, built custom NLP models as an ML engineer and even touched upon MLOps a bit for model deployments. Vino enjoys sharing her learnings and industry best practices through blogs, video tutorials and tech talks.

COVID-19 safety measures

Event will be indoors
The event host is instituting the above safety measures for this event. Meetup is not responsible for ensuring, and will not independently verify, that these precautions are followed.
Photo of lakeFS Community group
lakeFS Community
See more events
Trellis Coworking, Events, Cafe & Bar
981 Mission St · San Francisco, CA