Applied LLMs at Picnic: Warehouses, Notebooks & Evaluation Loops


Details
Join our upcoming meetup for all things LLMs!
We've planned an evening full of curated talks, great snacks, and even better company just for you:
Schedule
17:30 doors open, drinks, food
18:00 #1 Applying LLM's in our automated warehouse (Sven Arends - Picnic)
18:30 #2 Python notebooks are better now (Vincent D. Warmerdam - Marimo)
19:00 break, drinks
19:15 #3 Evaluation-Driven Development & Synthetic Data Flywheels (Hugo Bowne-Anderson - Independent Data and AI Scientist, Vanishing Gradients)
19:45 networking drinks
20:45 end
Talk 1: Applying LLM's in our automated warehouse (Sven Arends - Picnic)
We'll share how we're using multi modal LLM's in our automated warehouse. We'll discuss the challenges we've faced with the hardware, software, and GenAI components and we'll cover the practical aspects of GenAI deployment, including prompt optimization, preventing LLM "yapping," and creating a robust feedback loop for continuous improvement.
Talk 2: Python notebooks are better now (Vincent D. Warmerdam - Marimo)
This talk is about marimo, a Python notebook that completely rethinks how you might want to interact with code in a notebook. There's SQL cells, direct LLM support, stellar widgets and interactive dataframe tooling. But the most suprising thing about the notebook is that it even goes a step beyond by changing Python itself! The goal of this session is to explain all of this ... but ... this won't be a talk, it will be a live-coding session instead!
Talk 3: Evaluation-Driven Development & Synthetic Data Flywheels (Hugo Bowne-Anderson - Independent Data and AI Scientist, Vanishing Gradients)
Learn how evaluation-driven development can transform your LLM applications by helping you build a minimum viable evaluation framework (MVE), even before your product reaches real users. In this talk, you'll see how to generate synthetic queries from realistic personas, label outputs to define correctness and failure modes, and construct an evaluation harness to systematically compare models and prompts. The framework also seamlessly transitions into a robust evaluation approach once your app encounters real-world users, guiding iteration through structured analysis, and continuously tracking essential metrics such as accuracy, cost, and latency through lightweight observability.
See you all there! 👀

Applied LLMs at Picnic: Warehouses, Notebooks & Evaluation Loops