What we're about

This group is for practitioners, developers, aspiring and professional data engineers and data scientists in the greater Denver/Boulder area, who are interested in learning about data + AI.  Connect with fellow enthusiasts and learn more about open source projects including Apache Spark, Delta Lake, MLflow, Koalas, TensorFlow, Ray, PyTorch and a variety of other distributed computing and data engineering tools.

We host a variety of live online meetups which we'll call out in the title of each event.

Interviews: Interview style with time for Q&A, no slides

Tech Talks: Presentation, slides, demo and time for Q&A

Workshops: Tutorials with time for Q&A

Slack if you’re interested in Delta Lake: https://dbricks.co/DeltaSlack and/or MLflow: https://dbricks.co/MLflowSlackInvite

Databricks hosts a few meetup groups and the devrel team is always looking for community speakers. If you’re interested in giving a talk, message the organizer team!

Upcoming events (1)

Ray: A Framework for Scaling and Distributing Python & ML Applications

Happy New Year, everyone. Our group is broadening our focus this year.

Our first talk of the year features Jules Damji, Lead Developer Advocate at Anyscale as he discusses Ray: A Framework for Scaling and Distributing Python & ML Applications

ABOUT THE TALK:

Modern machine learning (ML) workloads, such as deep learning and large-scale model training, are compute-intensive and require distributed execution. Ray is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications and ML workloads from a laptop to a cluster, with an emphasis on the unique performance challenges of ML/AI systems. It is now used in many production deployments.

This talk will cover Ray’s overview, architecture, core concepts, and primitives, such as remote Tasks and Actors; briefly discuss Ray native libraries (Ray Tune, Ray Train, Ray Serve, Ray Datasets, RLlib); and Ray’s growing ecosystem.

Through a demo using XGBoost for classification, we will demonstrate how you can scale training, hyperparameter tuning, and inference—from a single node to a cluster, with tangible performance difference when using Ray.

The takeaways from this talk are :

Learn Ray architecture, core concepts, and Ray primitives and patterns
Why Distributed computing will be the norm not an exception
How to scale your ML workloads with Ray libraries:
Training on a single node vs. Ray cluster, using XGBoost with/without Ray
Hyperparameter search and tuning, using XGBoost with Ray and Ray Tune
Inferencing at scale, using XGBoost with/without Ray

ABOUT OUR SPEAKER:

Our Speaker, Jules Damji is the Lead Developer Advocate at Anyscale Inc.

He is an MLflow contributor, and co-author of Learning Spark, 2nd Edition. He is a hands-on developer with over 25 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/LoudCloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems. He holds a B.Sc and M.Sc in computer science (from Oregon State University and Cal State, Chico respectively), and an MA in political advocacy and communication (from Johns Hopkins University).

Past events (17)

How to Learn Apache Spark (ONLINE EVENT)

Online event

Photos (15)