Event
Details
FREE data + AI knowledge sessions to help each others and learn from each other
Session 5: ML Evaluation & Backtesting — How to Know If Your AI Actually Works
Everyone builds models. Few test honestly. This session covers evaluation methodology plus using AI agents to automate the testing process.
Demo (~15 min): Run a backtest on sample data. Show honest results: some approaches work (59%), some don't (49% — worse than random). Run an optimizer across 1,000 parameter combinations.
AI Hands-on (~15 min): Give an LLM your model's evaluation results and ask it to critique: "Here are my model metrics. What's wrong with my evaluation? What am I missing? How could I be fooling myself?" Compare how different models challenge your assumptions. Surprisingly useful for catching blind spots.
Open Floor (~20 min): How do you evaluate models in practice? A model that looked great in testing but failed in production — what happened? How do you explain model performance to non-technical stakeholders? Share your evaluation frameworks.
***
| Session | Special Topic | AI Component |
| ------- | ------------- | ------------ |
| 1 | Data & AI Ecosystem overview | First prompt challenge — everyone builds something |
| 2 | Cloud Infrastructure (AWS/GCP/Snowflake) | Vibe-code a Dockerfile + docker-compose with AI |
| 3 | Data Pipelines & Quality | AI anomaly detection on messy data |
| 4 | Analytics & BI | Text-to-SQL — ask your database in English |
| 5 | ML Evaluation & Backtesting | AI as model critic — "what's wrong with my eval?" |
| 6 | LLMs Deep Dive | Live model shootout — 4 models, same prompt, group scores |
| 7 | AI Agents — Tools, RAG, Memory | Vibe-code a working agent together |
| 8 | Multi-Agent Systems | Group design exercise — sketch a multi-agent for your problem |
| 9 | Vibe Coding Deep Dive | Build a complete feature live with AI |
| 10 | Show-and-Tell & Opportunities | Community demos + jobs + what's next |
