ChatGPT voice mode will become more seamless with the new real-time model

LootboxPhobia · 2026-03-06T00:05:27+00:00

OpenAI is developing a new audio model called BiDi, aimed at making conversations with ChatGPT more natural. The model can adjust responses in real-time rather than stopping the conversation when the user interrupts. Although the release of the BiDi model may be delayed until the second quarter or later, it is expected to improve the voice interaction experience of AI assistants, especially for applications like customer support.

LootboxPhobia

2026-03-06 00:05:27

Abstract generation in progress

Investing.com – According to The Information, OpenAI is developing a new audio model aimed at making conversations with ChatGPT feel less mechanical. The model allows AI to adjust its responses in real-time when interrupted.

Currently, the advanced voice mode in ChatGPT uses a turn-based system, requiring users to finish speaking before the AI processes the audio and generates a response. If users interrupt with words like “okay” or “uh-huh,” the model completely stops speaking instead of continuing the conversation naturally.

This new model, called BiDi (Bidirectional), is designed to continuously process the speaker’s voice so it can immediately adjust its responses when interrupted. Compared to existing audio models, this will make conversations more natural because current models produce fixed responses once the AI starts speaking, which cannot be changed.

However, the technology is not yet ready for release. According to someone familiar with the project, after a few minutes of conversation, the prototype often begins to malfunction or produce sounds that seem abnormal. Although OpenAI initially aimed to release BiDi in the first quarter of this year, the schedule may be delayed to the second quarter or later.

OpenAI believes narrowing the performance gap between speech and text-based models will expand AI usage worldwide, as most people find talking to AI assistants more natural than sending text messages.

The BiDi model is expected to be especially useful for customer support applications. For example, if a customer calling a retail AI support agent decides to change an item instead of returning it, BiDi could allow the agent to smoothly switch the conversation without stopping or becoming confused.

Someone familiar with the audio model also said it performs better when using external tools and applications. OpenAI has previously reported plans to improve its audio models for future AI-powered devices primarily operated through voice interactions, and is considering developing a smart speaker that can check emails or book services via voice commands.

This article was translated with the assistance of artificial intelligence. For more information, please see our Terms of Use.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.