Why Speech-to-Speech APIs Fail When Voice AI Needs to Evaluate

Name: Why Speech-to-Speech APIs Fail When Voice AI Needs to Evaluate
Start: 2026-05-06T17:30:00+10:00
End: 2026-05-06T19:30:00+10:00
Location: The Precinct

Hosted by Michael T. and Daisy

Queensland AI

Details

Google shipped Gemini Live. OpenAI launched the Realtime API. The pitch is seductive: stream audio in, get audio back. One WebSocket, sub-second latency. But what happens when your AI needs to evaluate a human, not just chat with them?

Join Niraj Kothawade as he walks through the architecture decisions behind MasterPrep AI — a voice AI platform that interviews and assesses candidates in real-time for enterprise hiring — and why he rejected speech-to-speech in favour of a server-side orchestration pipeline. Niraj will cover how state machines detect candidate behavioural patterns in real-time, why LLMs are unreliable at enforcing hard limits, and how the pipeline enables capabilities like AI plagiarism detection that speech-to-speech makes impossible.

In this talk, you will learn:
- Why speech-to-speech APIs break down when voice AI needs to evaluate, not just converse
- How server-side state machines detect behavioural patterns like Solution Traps and Logic Gaps in real-time
- The cost reality of audio tokens vs text tokens at scale
- What the pipeline unlocks that speech-to-speech can't — structured feedback, AI plagiarism detection, and deterministic control

About Niraj

Niraj Kothawade is a product leader and founder of MasterPrep AI, a voice AI platform for candidate interviews and assessment. He has 15+ years of experience building products at scale across companies including Deputy, Flipkart, and Yahoo. Find him on LinkedIn and X.

Queensland AI

Why Speech-to-Speech APIs Fail When Voice AI Needs to Evaluate

Queensland AI

Details

Related topics

You may also like