Skip to content

Paper Discussion: Dolphin: A Large-Scale ASR Model for Eastern Languages

Photo of Zhengjie Wang
Hosted By
Zhengjie W. and 3 others

Details

The Whisper model by OpenAI is a significant advancement in Automatic Speech Recognition. It performs well on range of languages. The foundation model approach allows developers to continue fine-tuning it on other languages.

Today, we're going to look into an ASR model which is a variety of whisper. It utilises CTC-Attention (Connectionist Temporal Classification) and E-Branchformer architecture to get better performance on a range of Eastern languages comparing to Whisper-v3.

paper: https://arxiv.org/abs/2503.20212
github: https://github.com/DataoceanAI/Dolphin?tab=readme-ov-file
related work:
- https://arxiv.org/abs/2210.00077
- https://huggingface.co/espnet/owsm_v3.1_ebf

Photo of Canberra Deep Learning Meetup group
Canberra Deep Learning Meetup
See more events
level 3/44 Sydney Ave
44 Sydney Ave · Forrest
Google map of the user's next upcoming event's location
FREE