Paper Discussion: Dolphin: A Large-Scale ASR Model for Eastern Languages

Details
The Whisper model by OpenAI is a significant advancement in Automatic Speech Recognition. It performs well on range of languages. The foundation model approach allows developers to continue fine-tuning it on other languages.
Today, we're going to look into an ASR model which is a variety of whisper. It utilises CTC-Attention (Connectionist Temporal Classification) and E-Branchformer architecture to get better performance on a range of Eastern languages comparing to Whisper-v3.
paper: https://arxiv.org/abs/2503.20212
github: https://github.com/DataoceanAI/Dolphin?tab=readme-ov-file
related work:
- https://arxiv.org/abs/2210.00077
- https://huggingface.co/espnet/owsm_v3.1_ebf

Paper Discussion: Dolphin: A Large-Scale ASR Model for Eastern Languages