DUB, the first open, human-evaluated benchmark designed to assess AI dubbing systems on emotional accuracy, prosody, and voice character across languages. Using more than 30,000 native-speaker A/B e ...
Small language models are like specialised tools in a toolbox, compared to something like ChatGPT that brings the whole workshop.
Abstract: In hours-long meeting scenarios, real-time speech stream often struggles with achieving accurate speaker diarization, commonly leading to speaker identification and speaker count errors. To ...
Abstract: Speaker Diarization (SD) is a crucial component of modern end-to-end ASR pipelines. Traditional SD systems, which are typically audio-based and operate independently of ASR, often introduce ...