| Title | : | Building Voice AI Agents That Don’t Suck [Kwindla Kramer] - 739 |
| Duration | : | 01:12:33 |
| Viewed | : | 1,659 |
| Published | : | 15-07-2025 |
| Source | : | Youtube |
In this episode, Kwindla Kramer, co-founder and CEO of Daily and creator of the open source Pipecat framework, joins us to discuss the architecture and challenges of building real-time, production-ready conversational voice AI. Kwin breaks down the full stack for voice agents—from the models and APIs to the critical orchestration layer that manages the complexities of multi-turn conversations. We explore why many production systems favor a modular, multi-model approach over the end-to-end models demonstrated by large AI labs, and how this impacts everything from latency and cost to observability and evaluation. Kwin also digs into the core challenges of interruption handling, turn-taking, and creating truly natural conversational dynamics, and how to overcome them. We discuss use cases, thoughts on where the technology is headed, the move toward hybrid edge-cloud pipelines, and the exciting future of real-time video avatars, and much more. 🗒️ For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/739. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: https://twitter.com/twimlai Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/ Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 📖 CHAPTERS =============================== 00:00 - Introduction 9:40 - Voice AI stack 12:37 - Pipecat and Daily 16:15 - Challenges and differences of voice applications with web applications 16:53 - Latency 19:45 - Real-time oriented transport standard 21:22 - Building blocks in voice agents 23:36 - Overcoming the challenges of activity detection and interruption handling 29:16 - Single big model vs multiple models 36:25 - Turn detection system 39:29 - Linkage of turn detection and voice activity detection 42:32 - Starter kits in Pipecat 43:28 - Evaluation for voice AI 49:22 - Text as an intermediary for observability and compliance 52:05 - Opportunities in video 54:45 - Challenges in video 56:36 - Inference on the edge 58:26 - Barriers in hybrid pipelines 1:00:17 - Pipecat 1:03:34 - MCP in Pipecat 1:07:41 - Gaps between measurements 1:10:52 - Use cases 🔗 LINKS & RESOURCES =============================== Daily - https://www.daily.co/ Pipecat - https://github.com/pipecat-ai/pipecat Pipecat Cloud - https://pipecat-cloud.mintlify.app/introduction Building the Internet of Agents with Vijoy Pandey - 737 - https://twimlai.com/podcast/twimlai/building-the-internet-of-agents/ Building AI Voice Agents with Scott Stephenson - 707 - https://twimlai.com/podcast/twimlai/building-ai-voice-agents/ 📸 Camera: https://amzn.to/3TQ3zsg 🎙️Microphone: https://amzn.to/3t5zXeV 🚦Lights: https://amzn.to/3TQlX49 🎛️ Audio Interface: https://amzn.to/3TVFAIq 🎚️ Stream Deck: https://amzn.to/3zzm7F5
![]() |
Infrastructure Scaling and Compound AI Systems ... 12:33 - 0 |
![]() |
Context Engineering for Productive AI Agents [F... 45:32 - 0 |
![]() |
FULL: Elon Musk Makes Shocking Future Predictio... 28:33 - 0 |
![]() |
Trump Calls for Jimmy Kimmel to Be Fired AGAIN,... 12:36 - 0 |
![]() |
Rainy Autumn Street Café Jazz 🌧️ Relaxing... 00:00 - 0 |
![]() |
Voiceflow CEO Braden Ream - Building Real World... 53:37 - 0 |
![]() |
From Chairside to C-Suite: Coaching the Dental ... 31:14 - 0 |
![]() |
Genie 3: A New Frontier for World Models [Jack ... 00:32 - 0 |


