What we're about

This is a developer-centric meetup focused on Apache Spark, Apache Flink, Apache Kafka, Apache Mesos, related Typesafe and Twitter OSS stacks, and broader distributed Data Science and Machine Learning. We're open to all OSS developers, vendors, consultants, and startups both using the tools and building or supporting them, attending, presenting, and organizing.

How it may be complementary to the original Spark Users, now Bay Area Spark Meetup: Spark in its end-to-end ecosystem -- Mesos, Akka, Kafka, Cassandra, etc., with focus on what works for the final goals of the whole pipeline. We will teach you how to use Scala for Spark to make you more effective, and consider devops options so you can get to production faster. We'll invite projects relevant to or inspired by Apache Spark, such as Apache Storm, Apache Flink, and others, and will be focused on putting together useful OSS as a system.

Upcoming events (2)

LLMOps: Test-Driven Development for Large Language Model Applications

Thank you to our host Pulze.ai!
Co-founder and CEO Fabian Baier will introduce Pulze.ai.
Thank you to our sponsor Airbyte for food, drinks, and recording support!
Sponsor introduction by Michel Tricot, Airbyte CEO.

NOTE: you have to register on Eventbrite to get in!

Josh Tobin (right) is the founder and CEO of Gantry. Previously, Josh worked as a deep learning & robotics researcher at OpenAI and as a management consultant at McKinsey. He is also the creator of Full Stack Deep Learning (fullstackdeeplearning.com), the first course focused on the emerging engineering discipline of production machine learning and LLM applications. Josh did his PhD in Computer Science at UC Berkeley advised by Pieter Abbeel.

Large language models are a powerful primitive for building applications quickly and easily. However, when it comes to robustness, reliability, and production readiness, they leave something to be desired.
If you've built applications with LLMs, you may have wondered, "isn't it a bit generous to call this prompt engineering?", "how do I know if this thing is actually working", or "is it even possible to test these things"?
In this talk, we will present a more principled way to develop LLM applications using an approach that is analogous to test-driven development. We'll also show you how to get started with this approach in minutes using Gantry.

Airbyte is the leading open-source data integration platform that seamlessly syncs data from the largest catalog of APIs, databases, and files to various destinations. Airbyte differentiates itself by its open-source extensibility, deployment options - cloud-hosted or self-managed and transparent and predictable pricing. Airbyte empowers AI-driven organizations with leveraging all their data, whatever the tools they use.

NOTE: you have to register on Eventbrite to get in!

ducktape -- Scala productivity + Holden Karau + Scala Center!

LaunchDarkly

SF Scala is coming back in person!
NOTE: Scale By the Bay, the conference of all the meetups By the Bay including SF Scala, is back in November! The Second Chance CFP is running until June 16. Early Bird passes are also on sale.
We have three talks!
NOTE: You have to Register at Eventbrite in order to attend.
# ducktape - holding Scala's productivity together
Speaker: Aleksander Rainko, Software Engineer at Scalac
Bio: I'm a professional Scala programmer for 3 years now, during this time I've developed big interest in the metaprogramming and the type level programming side of things of this very language. Amateurishly into compilers, programming languages overall, running, cycling and swimming.
Avid music listener.
He/him.
Abstract: Throughout my career as a developer I noticed that the majority of my everyday tasks come down to moving JSON from point A to point B and for all this time I’ve been trying to minimize the amount of (human-written) code that doesn’t bring joy (also known as boilerplate) but also solves the problem at hand.
After some time I managed to land on a subjectively close-to-perfect setup that involves:

  • generating server route definitions with guardrail or smithy4s,
  • a newtype and a refinement type library of your choice for that sweet, sweet typesafety and validation,
  • a mystery ingredient that abstracts away data transformations from generated code to my squeaky clean business domain model.

Now, what might that mystery ingredient that does all of that glue code magic for you be? Well, it’s ducktape(figuratively and literally).
Join me as I unveil the intricacies of this cool, little, macro-based library and show off how all of the above pieces fit together to create an extremely productive workflow for the most common of use cases - all of that in Scala 3.
Building Reliable Data Pipelines
Holden Karau is an American-Canadian computer scientist and author based in San Francisco, CA. She is best known for her work on Apache Spark, her advocacy in the open-source software movement, and her creation and maintenance of a variety of related projects including spark-testing-base.
Contributing to Scala
Speakers: Anatolii Kmetiuk, James Thompson and Guillaume Martres, Scala Center
Scala started as a research project and quickly evolved into a mature technology and a thriving ecosystem, thanks to the combined power of the open-source community and the software industry.
One of our missions, at the Scala Center, is to guide and support all contributors in developing the future of Scala, in a productive and harmonious way. Whether you are a Scala enthusiast, a Scala professional developer or a stakeholder of the Scala industry there are ways you can get involved that will benefit you and the community.
In this talk, you will hear about:

  • The mission of the Scala Center
  • The on-going projects we are working on with the community
  • Ways to get involved and contribute to Scala

This is also an opportunity to meet and chat with the Scala Center team.
NOTE: You have to Register at Eventbrite in order to attend.

Past events (74)

Scale By the Bay 2021

Needs a location