ByteDance releases the full-duplex speech large model Seeduplex, ushering AI voice interaction into the "listening and speaking simultaneously" era.

robot
Abstract generation in progress

AIMPACT message, April 9, ByteDance’s Seed team released the native full-duplex speech language model Seeduplex, which has been fully rolled out in the Doubao app, marking an upgrade in voice interaction from “turn-based” to real-time natural conversation.


Seeduplex uses joint speech and semantic modeling to achieve synchronized “listen while speaking” processing capabilities, with a significantly improved resistance to interference in complex environments. Data shows that compared with traditional half-duplex approaches, its incorrect reply rate and mistaken interruption rate have decreased by about 50%.


In terms of interaction experience, the model introduces a dynamic stop-and-go technology, shortening response latency by about 250 milliseconds, reducing talking-over incidents by 40%, and enabling more accurate differentiation between user pauses and the end of the conversation. At the same time, through speculative sampling and quantization optimization, the system maintains low latency and smoothness even in high-concurrency scenarios, increasing overall call satisfaction by about 8.34%.


This upgrade means that AI voice is evolving toward “real-time, multimodal, human-like interaction.” In the future, it is expected to combine visual capabilities to drive intelligent assistants toward integrated development of “listen, see, think, and speak.” (Source: ByteDance)



View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments