Skip to content

Observability Engineering Meetup | October Edition

Photo of Karim Traiaia
Hosted By
Karim T. and 2 others
Observability Engineering Meetup | October Edition

Details

This event is Sponsored by Kerno.

[IMPORTANT] Please bring a Government-issued photo ID, as it will be required for venue entry.

Hey everyone!
Welcome back to another exciting edition of the Observability Engineering London meetup! This time, we’re diving deep into two critical aspects of engineering – dashboards, runbooks, and large-scale migrations.

On Thursday, October 17th, we’ll be joined by two fantastic speakers:

First up, we have Colin , formerly the Observability Tech Lead at Cloudflare. Colin will explore the allure of creating hyper-specific dashboards and runbooks, and why this often does more harm than good in incident response. He’ll share insights on how to avoid the common pitfalls of hyper-specialization and provide a roadmap for using these tools more effectively in SRE practices.

Next, Will, Platform Engineer at Monzo, who will take us behind the scenes of how Monzo runs migrations across a staggering 2,800 microservices. Will’s talk will focus on Monzo’s approach to centrally driven migrations, with a specific look at their recent move from OpenTracing to OpenTelemetry.

This is shaping up to be a great event for anyone working with observability, incident response, or large-scale infrastructure. See you there!

👾 Gameplan:
6:00 Welcome drinks
6:30 Colin Douch | Dashboards and Runbooks: Scrapbooking for Engineers
7:15 Break | Food/drinks and Networking
7:30 Will Sewell | How we run migrations across 2,800 microservices
8:00 Break | Food/drinks and Networking
9:00 Wrap up and head to the pub downstairs to keep the conversation going.

👋 Connect with us

See you all there!
Karim
-------------
A Bit About Colin Douch | Having just moved back to London and taking a well-deserved break. Colin has been working, advising, and researching in the Monitoring and Observability space for close to 10 years. He has gained a wide perspective into the difficulties that modern companies, big and small, deal with in properly introspecting their physical and computerized systems. Originally from New Zealand, he frequently runs talks on observability developments, introducing new graduates to the world of observability and usually teaching some of the old-timers something new.

Talk | Dashboards and Runbooks: Scrapbooking for Engineers
With the SRE revolution, Alert Runbooks and Metrics Dashboards have become vital tools for engineering teams hoping to adopt better incident response strategies. Unfortunately, these tools are often used in a way that makes them ineffective at this task. In particular, these tools are usually created as knee-jerk responses to incidents without considering where they fit into the overall landscape of the incident response. This leads to hyper-specialized tooling that often masks the root causes of incidents and negatively impacts an incident response rather than aiding as they should.

Unfortunately, we noticed this problem too late, and it manifested itself in the form of thousands of dashboards and alert references that dramatically muddied our incident response. In this talk, I will cover why creating dashboards and runbooks is such an attractive proposition to engineering teams, why it's so easy to fall into the hyper-specificity trap, why having these runbooks and dashboards is such an issue, and where these tools should instead fall into your incident response structure. At the same time, I will introduce the process of migrating to more generic tooling with SLOs that we followed to dig ourselves out of this issue and the tooling we created to aid the discovery of problem runbooks and dashboards to help fellow SREs solve it themselves.

You can connect with Colin on Linkedin.
----------------
A Bit About Will Sewell |
Will is a Platform Engineer at Monzo who has mostly been focussed on building the tools that enable engineers to ship rapidly and safely. He’s more recently been working with the Infrastructure team to lift and shift the whole bank from our home-rolled Kubernetes cluster to EKS.

Talk | How we run migrations across 2,800 microservices
In this talk, you’ll see how Monzo migrated all 2,800 microservices to OpenTelemetry (from OpenTracing). We take an approach of centrally driven migrations, which enables us to upgrade dependencies while maintaining consistency. We’ll focus on both the migration principles and the tools and technologies that make this kind of migration possible, such as our mass deployment tooling and our monorepo.

You can connect with Will on LinkedIn

Photo of The London Observability Engineering Meetup group
The London Observability Engineering Meetup
See more events