ChatGPT voice mode will become more seamless with the new real-time model

robot
Abstract generation in progress

Investing.com – According to The Information, OpenAI is developing a new audio model aimed at making conversations with ChatGPT feel less mechanical. The model allows AI to adjust its responses in real-time when interrupted.

Currently, the advanced voice mode in ChatGPT uses a turn-based system, requiring users to finish speaking before the AI processes the audio and generates a response. If users interrupt with words like “okay” or “uh-huh,” the model completely stops speaking instead of continuing the conversation naturally.

This new model, called BiDi (Bidirectional), is designed to continuously process the speaker’s voice so it can immediately adjust its responses when interrupted. Compared to existing audio models, this will make conversations more natural because current models produce fixed responses once the AI starts speaking, which cannot be changed.

However, the technology is not yet ready for release. According to someone familiar with the project, after a few minutes of conversation, the prototype often begins to malfunction or produce sounds that seem abnormal. Although OpenAI initially aimed to release BiDi in the first quarter of this year, the schedule may be delayed to the second quarter or later.

OpenAI believes narrowing the performance gap between speech and text-based models will expand AI usage worldwide, as most people find talking to AI assistants more natural than sending text messages.

The BiDi model is expected to be especially useful for customer support applications. For example, if a customer calling a retail AI support agent decides to change an item instead of returning it, BiDi could allow the agent to smoothly switch the conversation without stopping or becoming confused.

Someone familiar with the audio model also said it performs better when using external tools and applications. OpenAI has previously reported plans to improve its audio models for future AI-powered devices primarily operated through voice interactions, and is considering developing a smart speaker that can check emails or book services via voice commands.

This article was translated with the assistance of artificial intelligence. For more information, please see our Terms of Use.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin