LLM-as-a-Judge for Voice Agents: Testing Non-Deterministic AI with Simulated Callers

You cannot unit-test a conversation. The testing playbook for production voice agents: a four-layer test pyramid, simulated callers over real audio, LLM-as-a-judge scoring calibrated to design intent, the transcript-integrity trap, and the 2-of-3 flake rule.

Continue ReadingLLM-as-a-Judge for Voice Agents: Testing Non-Deterministic AI with Simulated Callers

AI Voice Agent Architecture: What I Learned Building the Same Agent Three Times

I built the same production voice agent three times. The orchestrator collapsed under coupling, server-gated turns created dead air, and the third architecture — where the realtime model owns the conversation — is the one that survived. Pros, cons, and diagrams of all three.

Continue ReadingAI Voice Agent Architecture: What I Learned Building the Same Agent Three Times
Read more about the article Why AI Software Development Breaks Traditional Software Engineering
Fix one prompt — re-test everything.

Why AI Software Development Breaks Traditional Software Engineering

Fixing an AI bug isn't like fixing a regular bug — every prompt change ripples through the whole system. Why AI software development needs a new kind of regression testing, and why traditional software engineering doesn't apply.

Continue ReadingWhy AI Software Development Breaks Traditional Software Engineering
Read more about the article Enabling HTTP Streaming with AWS Amplify and Lambda Function URLs
HTTP streaming with AWS Amplify and Lambda

Enabling HTTP Streaming with AWS Amplify and Lambda Function URLs

Introduction AWS Amplify combined with Lambda function URLs introduces a powerful mechanism for building cloud-backed apps that can leverage HTTP streaming capabilities. This not only bypasses the need for the…

Continue ReadingEnabling HTTP Streaming with AWS Amplify and Lambda Function URLs