Skip to content

Details

## TorontoAI Observability Summit — March 2026

TorontoAI is planning an Observability Summit in March 2026. This event is designed to bring together observability startups, platform engineers, SREs, AI/ML engineers, and cloud builders to share what’s working, what’s broken, and what’s next in monitoring and reliability.

The goal is simple: connect the Toronto observability ecosystem—and make it easy for founders, practitioners, and vendors to meet, demo, collaborate, and (hopefully) spark new partnerships.

### What we’ll cover

Observability for Neo-Cloud Providers (Next-Gen Infrastructure)
New cloud providers and “neo-cloud” platforms are changing the rules—especially around GPU infrastructure, multi-tenant platforms, and cost/performance tradeoffs. We’ll explore:

  • The observability challenges neo-cloud teams face at scale (metrics volume, cost control, noisy alerts, multi-tenant isolation)
  • Reliability patterns for modern platforms (control plane vs data plane visibility)
  • Capacity planning, SLOs, and incident response in high-throughput environments
  • Real-world war stories: what actually worked in production

LLM Observability and Inference Engineering

LLM-powered apps introduce new failure modes that don’t show up in traditional monitoring. This track focuses on:

  • Observability for LLM inference (latency, throughput, queueing, batch sizing, token-level metrics)
  • Prompt/runtime monitoring: quality, drift, hallucination signals, guardrails, and evaluation
  • RAG pipeline visibility (retrieval quality, embedding/index performance, cache hit rates)
  • Debugging and performance optimization for modern LLM stacks (vLLM/TGI/Triton, GPUs, model routing)

Traditional Cloud Observability (Kubernetes + Applications)

Core observability still matters—especially as stacks grow more complex. We’ll include:

  • Kubernetes observability patterns (clusters, nodes, workloads, autoscaling behavior)
  • Application performance monitoring (APM), tracing, logging, and correlation
  • Incident workflows, on-call hygiene, and practical alerting strategies
  • Tooling approaches: OpenTelemetry, metrics/logs/traces pipelines, and cost-aware observability

### Who should attend

  • Platform / Infrastructure Engineers
  • SREs and DevOps teams
  • Observability engineers and architects
  • AI/ML engineers building LLM apps in production
  • Founders and builders in the observability space
  • Neo-cloud / GPU cloud operators

### Call for speakers, demos, and sponsors

We’re now opening early interest for:

  • Speakers (deep technical talks, case studies, lessons learned)
  • Demo teams (product demos or open-source walkthroughs)
  • Sponsors (to help scale this into a full-day summit)

If you’re building in observability—whether for cloud infrastructure, Kubernetes, or LLM systems—and want to share a talk or demo, TorontoAI would love to include you.

Members are also interested in