
What we’re about
Hands-on project-oriented data science, with a heavy focus on machine learning and artificial intelligence. We're here to get neck-deep into projects and actually do awesome things!
Join our new discord https://discord.gg/xtFVsSZuPG where you can:
- discuss more AI/ML papers
- suggest/plan events
- share and discuss github projects
- find and post jobs on our jobs channel
- buy/sell used local gpu/server equipment
- scroll our social media aggregators for the latest AI research news across Bsky, X, Reddit, Youtube, Podcasts, and more
The meetup consists of:
- recurring study groups (if you want to start one, just notify Ben to be made a meetup co-organizer).
- intermediate/advanced working groups (starting in 2019)
- occasional talks and gathering (aiming for at least quarterly starting in 2019)
Upcoming events (4+)
See all- Paper: Why Do Multi-Agent LLM Systems FailLink visible for attendees
Join us for a paper discussion on "Why Do Multi-Agent LLM Systems Fail? A Multi-Agent System Failure Taxonomy (MAST)"
Exploring systematic failure patterns in multi-agent systems through empirical analysis and grounded theory methodology
Featured Paper:
"Why Do Multi-Agent LLM Systems Fail? A Multi-Agent System Failure Taxonomy (MAST)" (Cemri et al., 2025)
arXiv Paper | GitHub Dataset
Discussion Topics:
MAST Taxonomy Framework- 14 distinct failure modes organized into 3 categories: specification issues (FC1), inter-agent misalignment (FC2), task verification (FC3)
- Grounded theory methodology applied to 200+ execution traces across 7 MAS frameworks
- Cohen's Kappa agreement score of 0.88 between expert annotators
Failure Category Analysis
| Category | Prevalence | Key Failure Modes | Impact |
| -------- | ---------- | ----------------- | ------ |
| FC1: Specification Issues | 41.77% | Step repetition (17.14%), Task disobedience (10.98%) | Design flaws |
| FC2: Inter-Agent Misalignment | 36.94% | Reasoning-action mismatch (13.98%), Clarification failure (11.65%) | Coordination breakdown |
| FC3: Task Verification | 21.30% | Incomplete verification (6.82%), Incorrect verification (6.66%) | Quality control gaps |Implementation Challenges
- ChatDev achieves only 33.33% correctness on ProgramDev benchmark despite explicit verifier agents
- Superficial verification strategies (compilation checks vs functional correctness)
- System design issues beyond base LLM limitations
Key Technical Features
- LLM-as-a-judge pipeline using OpenAI o1 with 94% accuracy and 0.77 Cohen's Kappa
- Validated across unseen systems (Magentic-One, OpenManus) with 0.79 agreement score
- Intervention studies showing +15.6% improvement through architectural changes
Future Directions
- Efficiency taxonomy development beyond correctness metrics
- Structural redesign principles for high-reliability multi-agent organizations
- Integration with constitutional AI and verification frameworks
***
Silicon Valley Generative AI Meeting Formats
Paper Reading- Biweekly sessions on multi-agent system reliability
- Collaborative analysis with Boulder Data Science
Talks
- Monthly presentations on agent coordination and system design
- Topics range from failure analysis to organizational design principlesSilicon Valley Generative AI has two meeting formats.
1. Paper Reading - Every second week we meet to discuss machine learning papers. This is a collaboration between Silicon Valley Generative AI and Boulder Data Science.
2. Talks - Once a month we meet to have someone present on a topic related to generative AI. Speakers can range from industry leaders, researchers, startup founders, subject matter experts and those with an interest in a topic and would like to share. Topics vary from technical to business focused. They can be on how the latest in generative models work and how they can be used, applications and adoption of generative AI, demos of projects and startup pitches or legal and ethical topics. The talks are meant to be inclusive and for a more general audience compared to the paper readings.If you would like to be a speaker or suggest a paper email us @ svb.ai.paper.suggestions@gmail.com or join our new discord !!!
- Reinforcement Learning: Chapter 3 Finite Markov Decision ProcessesLink visible for attendees
Chapter 3 introduces the mathematical formalism for defining the full reinforcement learning problem in the book. We will cover the definition of probability transition functions, reward signals, and the discounted return. If there is time we will continue with the discussion of policies and value functions as explained with the gridworld example.
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Useful Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Recordings of Previous Meetings
Short RL Tutorials
My exercise solutions and chapter notes
Kickoff Slides which contain other links
Video lectures from a similar course - Reinforcement Learning: Topic TBALink visible for attendees
Typically covers chapter content from Sutton and Barto's RL book
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Useful Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Recordings of Previous Meetings
Short RL Tutorials
My exercise solutions and chapter notes
Kickoff Slides which contain other links
Video lectures from a similar course - Reinforcement Learning: Topic TBALink visible for attendees
Typically covers chapter content from Sutton and Barto's RL book
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Useful Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Recordings of Previous Meetings
Short RL Tutorials
My exercise solutions and chapter notes
Kickoff Slides which contain other links
Video lectures from a similar course