Building the Next Generation of Conversational AI

10.598 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Building the Next Generation of Conversational AI

Inside the Code: Ankit Kumar (Sesame) & Anjney Midha (a16z) on the Future of Voice AI

What goes into building a truly natural-sounding AI voice? In this episode, Sesame’s cofounder and CTO, Ankit Kumar, joins a16z’s Anjney Midha for a deep dive into the research and engineering behind their voice technology.

They discuss the technical challenges of real-time speech generation, the trade-offs in balancing personality with efficiency, and why the team is open-sourcing key components of their model. Ankit breaks down the complexities of multimodal AI, full-duplex conversation modeling, and the computational optimizations that enable low-latency interactions. They also explore the evolution of natural language as a user interface and its potential to redefine human-computer interaction.

Plus, we take audience questions on everything from scaling laws in speech synthesis to the role of in-context learning in making AI voices more expressive.

Key Takeaways:
- How Sesame achieves natural voice interactions through real-time speech generation.
- The impact of open-sourcing their speech model and what it means for AI research.
- The role of full-duplex modeling in improving AI responsiveness.
- How computational efficiency and system latency shape AI conversation quality.
- The growing role of natural language as a user interface in AI-driven experiences.

For anyone interested in AI and voice technology, this episode offers an in-depth look at the latest advancements pushing the boundaries of human-computer interaction.

Follow everyone on X:
Ankit Kumar - https://x.com/_apkumar
Anjney Midha - https://x.com/anjneymidha

Check out everything a16z is doing with artificial intelligence, including articles, projects, and more podcasts here – https://a16z.com/ai/

Chapters:
0:00 - 00:51    | Intro
00:52 - 04:58  | Challenges Of Building 
04:59 - 07:45  | Q + A: What Was Done To Bridge Transcription And Text Processing?
07:46 - 09:57 | How Is Sesame So Much Better Than Others?
09:58 - 12:42 | Challenges In| Making AI Accessible To All
12:43 - 14:10 | Great Researchers Prioritize User Experience
14:11 - 15:47 | What Is Good Taste In ML?
15:48 - 17:45 | Problems That Can Be Solved That Add Value To The World
17:46 - 26:25 |  Open Source Audio For Speech Generation
26:26 - 34:00 | Contextual Speech vs Text to Speech, Differences
34:01 - 35:50 | Value Proposition Of Glasses With No Friction
35:51 - 38:00 | General Purpose API vs Open Source Model
38:01 - 40:47 | Creating High Quality APIs
40:48 - 45:54 | Companions And  How Sesame Will Handle Context Retention In Long Conversations 
45:55 - 46:59 | Talent: What It Takes To Become A Part Of The Sesame Team
47:00 - 54:37 | How Scaling Laws For Speech Differ From Text
54:38 - 58:33 | How An Organic Conversation Be Preserved Using A Voice Companion
58:34 - 1:03:52 | App Building Technology: Roadmap
1:03:53 - 1:09:09 | Architectures and Transformers
1:09:10 - 1:15:56 | The Focus On Personality, And The Differences In Products
1:15:57 - 1:25:25 | New AI Interface: Interacting With AI Companion
1:25:26 - 1:26:56 | Companion Challenges 
1:26:57 - 1:29:22 | Computing Interface Of The Future
1:29:23 - 1:31:45 | Focused Product Experience Built By Small Teams
1:31:46 - 1:36:13 | Join Sesame If You Want To Make A Consumer Product People Love					

Building the Next Generation of Conversational AI

Nhạc Theo Chủ Đề

Liên kết website