Optimizing Inferencing

Name: Optimizing Inferencing
Start: 2026-03-12T17:00:00+01:00
End: 2026-03-12T21:00:00+01:00
Location: AI Sweden

Hosted by Patrick C.

Stockholm MLOps Community

Details

All right!!!

Our fourth meetup of 2026 will take place March 12! This time, we'll team up with evroc and once again meet up at AI Sweden!

Our theme for this meetup is Inferencing and more specifically, how to optimize inferencing from perspectives such as cost, performance and sovereignty. The event program is being worked out so stay tuned for updates!

Event Program

- Doors open at 17:00 CET

- Talks begin at 17:45 CET

- There will be light food & drinks

- There may be a moderated Q&A session....

Speaker Line-Up (under construction)

Matthijs Kok, Lead AI Plumber, evroc will give a talk titled "Hitchhikers' Guide to the World of Inference: the Bumpy Road to Scalability". Abstract: Scaling serverless inference is less about GPUs and more about what breaks when you add them. We dig deep into a bug that corrupted live agentic sessions, and the zero-downtime release machinery that lets us ship fixes and new models.
Lucas Ferreira, Founder & CEO of Inceptron will give a talk titled: "Title: From Model Access to Model Ownership: Scaling Open AI Systems." Abstract: As AI moves from closed-model APIs to open models, the promise is clear: lower costs, more control, and greater flexibility. But that shift also brings a much harder MLOps challenge - from deployment and scaling to observability, infrastructure ownership, and performance optimisation. In this talk, Lucas Ferreira will explore what it really takes to run open models in production, drawing on Inceptron’s experience operating a serving stack that processes 15B+ tokens per day on OpenRouter. He’ll also share why infrastructure advantage increasingly comes from the optimisation layer itself, including Inceptron’s compiler, which automatically analyses and optimises models for target hardware across GPUs, FPGAs, and edge environments. The session will unpack the operational realities of moving from the convenience of closed models to the control of open ones - and what teams need to make that transition reliable, efficient, and production-ready.
Göran Sandahl, Co-Founder & CEO, Opper will give a talk titled: "Model Interoperability in Practice: Why Your Agents Shouldn't Care Which Model It Uses." Abstract: AI models evolve fast, but most teams are locked into a single, often too large, model . In this talk, I’ll show what model interoperability looks like in practice. Through two live demos — a multi-model chat and an AI roundtable where models debate each other you’ll see how a single API layer can route, compare, and orchestrate across models from different providers, turning models into interchangeable units.
Emelie Wahlström, Head of Programs, Rasmus Larsson, Senior Data Scientist and Daniel Gustafsson, Senior Program Manager, all at AMD Silo AI will give a talk titled "Accelerating Inference with AMD". Abstract: A quick presentation of how AMD’s EAI Suite accelerates model inference and fits into real‑world MLOps workflows.

Looking forward to seeing you there,

/Patrick & the Stockholm MLOps Team

Stockholm MLOps Community

Optimizing Inferencing

Stockholm MLOps Community

Details

Related topics

You may also like