Talk: Observability at Scale


Details
Welcome Pythonistas.
Bring your lunch and join us at Gigaparts for our monthly talks. This month we'll hear from Chris Hodges about scaling Loki and Grafana to handle multi-petabyte workloads.
11:30-12:00 Socialization
12:00-12:30 Loki at Dropbox: Strategies for reliable petabyte-scale logging
12:30-1:00 TBD
Loki at Dropbox: Strategies for reliable petabyte-scale logging by Chris Hodges
How do you manage logging at petabyte scale? In this session, Infrastructure Software Engineer Chris Hodges shares hard-won lessons from scaling Grafana Loki to manage multi-petabyte unstructured logs for 1,000+ services at Dropbox. Learn how his team evolved a single distributed Loki cluster into a reliable petabyte-scale logging platform while balancing developer needs and operational realities. Strategies to operationalize Loki at enterprise scale will be presented, including multi-tenancy guardrails, label cardinality containment, runtime kill switches, limit configuration, and multihoming.
Chris is an Infrastructure Software Engineer at Dropbox where he is the Tech Lead for the Logging and Distributed Tracing team which operate a variety of observability tools at scale include profiling, error monitoring, Grafana, Loki, and Tempo. He has a diverse background including embedded software development in the telecom and defense industries.
TBD
We're still looking for a 30 minute talk for this session
As always, we're always looking for more speakers (including a 30 minute slot for this event). Please indicate if you're interested when you sign up.

Talk: Observability at Scale