About us
About TorontoAI
TorontoAI is a vibrant, inclusive community of engineers, builders, founders, and curious minds passionate about making AI infrastructure more accessible, human-centered, and scalable.
We host bi-weekly in-person socials, tech meetups, and hands-on webinars to connect people across disciplines — from DevOps to Data Science, from students to senior architects. Whether you're deploying LLMs in production or just exploring what Databricks does, you're welcome here.
🤝 We’re Building More Than a Meetup
In a world dominated by virtual everything, we believe in real, human-to-human connection.
TorontoAI is a space to:
- Share ideas over coffee
- Spark collaborations face-to-face
- Meet people who understand your stack and your journey
- Build your network beyond LinkedIn likes
💬 What We Talk About:
- Scalable AI & LLM infrastructure (Kubernetes, GPUs, vLLM, Ollama, LangChain)
- Databricks, Snowflake, Fivetran, dbt — building the modern data stack
- MLOps, LLMOps, DevOps — the operational glue of AI systems
- Real-world engineering stories, founder spotlights, and tool breakdowns
🌈 Who We Welcome:
- DevOps, SREs & Platform Engineers moving into data/AI
- Data Engineers, Analysts & ML practitioners
- Founders, freelancers, and technologists in transition
- Students and early-career professionals seeking real-world exposure
We’re committed to creating a welcoming, diverse, and equity-focused space where all voices matter — no gatekeeping, no rockstars, just good humans building cool stuff.
📍 Based in Toronto, open to the world
📅 Join an event — and be part of something human, helpful, and hands-on.
Upcoming events
4

Panel Discussion: Intellectual Property in the Era of Vibe Coding
401 Bay Street, Meet at Bay Street Entrance, Toronto, ON, CAWhen code becomes commoditized, what actually gets protected?
This is an in-person panel discussion hosted at a Dipchand Law office, bringing together experts from legal, AI, and strategy domains to explore how intellectual property is evolving in the age of AI-assisted development and “vibe coding.”
Why this matters:
AI tools are rapidly commoditizing software development. The barrier to building products is dropping, shifting the focus from writing code to owning ideas, data, and systems. This creates new challenges around ownership, licensing, and long-term defensibility.
Key discussion areas:
- Whether code still holds value as intellectual property
- Ownership of AI-generated code and outputs
- What developers and companies should protect beyond code (data, workflows, architecture)
- Enterprise risks including compliance, governance, and data exposure
- How organizations build defensibility when building becomes easy
Speakers:
Stephano Salani
Intellectual Property Lawyer, Dipchand LLPYulia Pavlova, PhD
Applied AI and Governance Leader, RBC Borealis AIMohit Rajhans
AI Consultant, ThinkStart.caEvent details:
Date: May 20, 2026
Time: 5:30 PM to 7:30 PM EDT
Location: Dipchand LLP Office, Toronto, ONWhat to expect:
Panel discussion, networking, and Q&A sessionRSVP - https://luma.com/0ktn86g3
2 attendees
Genie, Agent Bricks, or Build Your Own on Databricks Lakebase
·OnlineOnlineData Engineering leaders deciding whether to adopt Genie, build a custom text-to-SQL stack, or wire something in between.
90 minutes. Live build, not slides. Real workspace, real data, a real LLM call across an HTTP boundary you control.What you will see
I'll go from an empty Databricks workspace to a working text-to-SQL agent that:
- Joins live OLTP rows in Lakebase (managed Postgres) with pre-aggregated gold tables in Unity Catalog Delta — through Lakehouse Federation, in a single query.
- Generates SQL via a pluggable LLM endpoint — Databricks Model Serving, OpenAI-compatible APIs, or a self-hosted vLLM on a neo-cloud GPU — switched with one environment variable.
- Validates every SQL string before execution with a SELECT-only safety guardrail that catches the Databricks-specific destructive ops generic validators miss (`OPTIMIZE`, `VACUUM`, `ZORDER`, `COPY`).
- Is auditable end-to-end: one question = one LLM call, one SQL statement, one execution. No autonomous loops, no surprise bills.
What you will leave with
- A decision framework for db-agent vs Genie vs Agent Bricks for your specific use case — including when not to build.
- The companion open-source db-agent repo (presented at AAAI-25, ships a Databricks Apps deployment variant) and a quick-lab repo with a step-by-step build.
- A reference architecture diagram and the actual code — pipeline orchestrator is ~60 lines of Python, safety validator is ~30.
- Specific gotchas that cost me a half-day each: federation database options, Lakebase token rotation, Streamlit/Apps reverse-proxy traps, context-window blowouts on real catalogs.
Who is this for
- Heads of Data, Data Engineering Managers, Staff and Principal Data Engineers.
- Teams already on Databricks (or evaluating) who are being asked: "Can we put an AI agent on top of this?"
- Anyone making a build-vs-buy call between Genie, Agent Bricks, and a custom text-to-SQL stack — and wants to make it with their eyes open.
- Demo of the Reference Architecture explained here - https://becloudready.com/blog/text-to-sql-databricks-lakebase-db-agent
This is a technical session. We'll read code. Bring your senior engineers.
Agenda
- The architecture in one slide (5 min)
- Lakebase + Unity Catalog + Lakehouse Federation — why both data planes, and what breaks (15 min)
- The agent pipeline — schema → prompt → LLM → validate → execute (20 min)
- The SQL safety guardrail — what generic SELECT-only validators miss on Databricks (10 min)
- The pluggable LLM layer — live swap from a hosted API to a self-hosted vLLM on a neo-cloud GPU (15 min)
- db-agent vs Genie vs Agent Bricks — when to use which, and why (10 min)
- Q&A (15 min)
About the Speaker
Chandan Kumar — founder of BeCloudReady, organizer of the TorontoAI community (10K+ members), and a Databricks Partner. Maintainer of the open-source db-agent text-to-SQL agent, presented at AAAI-25. Runs the Databricks Lakehouse Bootcamp and works with engineering teams on getting AI agents into production against real data18 attendees
Build Your First AWS Data Lake in 60 Minutes — Live, with Real Code
·OnlineOnlineA free, hands-on community session for data engineers and folks breaking into data.
If you've read the AWS docs but never actually built the pipeline end-to-end — or you've shipped pieces of it but never seen how they all fit together — this session is for you. We'll go from an empty AWS account to a working data lake in 90 minutes. Live build, real code, real data.### What we'll build together
The pipeline that's underneath every "data lake on AWS" project — once you see it once, you see it everywhere:
```
S3 raw CSV → Glue Crawler → Glue Catalog → Glue ETL Job
→ S3 curated Parquet → Glue Crawler → Athena
```Concretely, you'll watch:
- A raw CSV (Kaggle Crude Oil historical data, ~6,400 rows) land in S3
- A Glue Crawler infer the schema and register a table in the Glue Catalog
- An Athena query against that CSV — count rows, filter, aggregate, with no servers to manage
- A PySpark Glue ETL job transform the CSV into partitioned Parquet (columnar, compressed, ~10× cheaper to scan)
- A second crawler register the Parquet table
- The same query, run again — and a side-by-side comparison of "data scanned" between CSV and Parquet
That last comparison is the punchline. Parquet vs CSV is the difference between a $5 query and a $0.50 query at scale. Seeing it in the Athena UI lands differently than reading about it.
### What you'll leave with
- The full lab open-sourced — Terraform for the IAM + sandbox + Glue + Athena setup, the PySpark ETL script, and a step-by-step walkthrough you can re-run on your own AWS account.
- A working mental model of the shape of every data lake project: land raw → catalog → transform → catalog again → query. The dataset is just the variable.
- Practical IAM patterns most tutorials skip — region-locking, prefix-scoped S3 access, Glue role policies that don't accidentally grant the world. The kind of thing that actually shows up in production reviews.
- A take-home assignment: run the same pipeline against a Kaggle dataset of your choice. Bring it to the next session for feedback.
### Who this is for
- Data engineers who've shipped pieces of a data lake but want to see the whole pipeline end-to-end
- Career changers moving into data engineering — this is a portfolio-grade project you can talk about in interviews
- Bootcamp grads and self-taught engineers who can write SQL and Python but haven't seen how Glue, Athena, and S3 actually fit together
- Backend engineers picking up data work and wanting a fast on-ramp to the AWS data stack
- Anyone whose manager said "we should look at building a data lake" and now it's on your plate
If you've never opened the AWS console, you'll still follow along — we explain every click. If you've been doing this for years, you'll probably still pick up the IAM scoping pattern.
### Format
- Live on Microsoft Teams — questions in chat, full screen-share, no slides
- 90 minutes — same length as the lab itself
- Recording shared with everyone who registers
- Open Q&A throughout, not just at the end
### About your host
Chandan Kumar — founder of beCloudReady and organizer of TorontoAI, a 10K+ member community of AI and data builders. Twenty-plus years across software, cloud, and data engineering. Has trained and placed 500+ engineers across Canada and the US. Maintainer of open-source labs and the db-agent project (presented at AAAI-25).
23 attendees
Past events
275


