AI in Healthcare: Real Examples That Work Today
A builder’s honest tour of where AI in healthcare actually works today — and where it does not. Real operational examples, sorted from working now to overhyped.
A builder’s honest tour of where AI in healthcare actually works today — and where it does not. Real operational examples, sorted from working now to overhyped.
You cannot unit-test a conversation. The testing playbook for production voice agents: a four-layer test pyramid, simulated callers over real audio, LLM-as-a-judge scoring calibrated to design intent, the transcript-integrity trap, and the 2-of-3 flake rule.
I built the same production voice agent three times. The orchestrator collapsed under coupling, server-gated turns created dead air, and the third architecture — where the realtime model owns the conversation — is the one that survived. Pros, cons, and diagrams of all three.
Fixing an AI bug isn't like fixing a regular bug — every prompt change ripples through the whole system. Why AI software development needs a new kind of regression testing, and why traditional software engineering doesn't apply.