Observability Summit — 2026
Details
## TorontoAI Observability Summit — March 2026
TorontoAI is planning an Observability Summit in March 2026. This event is designed to bring together observability startups, platform engineers, SREs, AI/ML engineers, and cloud builders to share what’s working, what’s broken, and what’s next in monitoring and reliability.
The goal is simple: connect the Toronto observability ecosystem—and make it easy for founders, practitioners, and vendors to meet, demo, collaborate, and (hopefully) spark new partnerships.
### What we’ll cover
Observability for Neo-Cloud Providers (Next-Gen Infrastructure)
New cloud providers and “neo-cloud” platforms are changing the rules—especially around GPU infrastructure, multi-tenant platforms, and cost/performance tradeoffs. We’ll explore:
- The observability challenges neo-cloud teams face at scale (metrics volume, cost control, noisy alerts, multi-tenant isolation)
- Reliability patterns for modern platforms (control plane vs data plane visibility)
- Capacity planning, SLOs, and incident response in high-throughput environments
- Real-world war stories: what actually worked in production
LLM Observability and Inference Engineering
LLM-powered apps introduce new failure modes that don’t show up in traditional monitoring. This track focuses on:
- Observability for LLM inference (latency, throughput, queueing, batch sizing, token-level metrics)
- Prompt/runtime monitoring: quality, drift, hallucination signals, guardrails, and evaluation
- RAG pipeline visibility (retrieval quality, embedding/index performance, cache hit rates)
- Debugging and performance optimization for modern LLM stacks (vLLM/TGI/Triton, GPUs, model routing)
Traditional Cloud Observability (Kubernetes + Applications)
Core observability still matters—especially as stacks grow more complex. We’ll include:
- Kubernetes observability patterns (clusters, nodes, workloads, autoscaling behavior)
- Application performance monitoring (APM), tracing, logging, and correlation
- Incident workflows, on-call hygiene, and practical alerting strategies
- Tooling approaches: OpenTelemetry, metrics/logs/traces pipelines, and cost-aware observability
### Who should attend
- Platform / Infrastructure Engineers
- SREs and DevOps teams
- Observability engineers and architects
- AI/ML engineers building LLM apps in production
- Founders and builders in the observability space
- Neo-cloud / GPU cloud operators
### Call for speakers, demos, and sponsors
We’re now opening early interest for:
- Speakers (deep technical talks, case studies, lessons learned)
- Demo teams (product demos or open-source walkthroughs)
- Sponsors (to help scale this into a full-day summit)
If you’re building in observability—whether for cloud infrastructure, Kubernetes, or LLM systems—and want to share a talk or demo, TorontoAI would love to include you.
