DataoceanAI
/

dolphin-base

Automatic Speech Recognition

Model card Files Files and versions

wgb14 commited on Mar 27

Commit

41d0720

·

verified ·

1 Parent(s): 35c80e0

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Dolphin
-[Paper]
 [Github](https://github.com/DataoceanAI/Dolphin)
 [Huggingface](https://huggingface.co/DataoceanAI)
 [Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
@@ -12,7 +12,7 @@ Dolphin is a multilingual, multitask ASR model developed through a collaboration
 ![Mulitask data format](https://raw.githubusercontent.com/DataoceanAI/Dolphin/refs/heads/main/figures/multitask-data-format.png)
 Dolphin largely follows the innovative design approach of [Whisper](https://github.com/openai/whisper) and [OWSM](https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1). A joint CTC-Attention architecture is adopted, with encoder based on E-Branchformer and decoder based on standard Transformer. Several key modifications are introduced for its specific focus on ASR. Dolphin does not support translation tasks, and eliminates the use of previous text and its related tokens.
-A significant enhancement in Dolphin is the introduction of a two-level language token system to better handle linguistic and regional diversity, especially in Dataocean AI dataset. The first token specifies the language (e.g., `<zh>`, `<ja>`), while the second token indicates the region (e.g., `<CN>`, `<JP>`). See details in [paper].
 ## Setup
@@ -43,7 +43,7 @@ pip install git+https://github.com/SpeechOceanTech/Dolphin.git
 ### Models
-There are 4 models in Dolphin, and 2 of them are available now. See details in [paper].
 |  Model  | Parameters | Average WER | Publicly Available |
 |:------:|:----------:|:------------------:|:------------------:|

 # Dolphin
+[Paper](https://arxiv.org/abs/2503.20212)
 [Github](https://github.com/DataoceanAI/Dolphin)
 [Huggingface](https://huggingface.co/DataoceanAI)
 [Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
 ![Mulitask data format](https://raw.githubusercontent.com/DataoceanAI/Dolphin/refs/heads/main/figures/multitask-data-format.png)
 Dolphin largely follows the innovative design approach of [Whisper](https://github.com/openai/whisper) and [OWSM](https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1). A joint CTC-Attention architecture is adopted, with encoder based on E-Branchformer and decoder based on standard Transformer. Several key modifications are introduced for its specific focus on ASR. Dolphin does not support translation tasks, and eliminates the use of previous text and its related tokens.
+A significant enhancement in Dolphin is the introduction of a two-level language token system to better handle linguistic and regional diversity, especially in Dataocean AI dataset. The first token specifies the language (e.g., `<zh>`, `<ja>`), while the second token indicates the region (e.g., `<CN>`, `<JP>`). See details in [paper](https://arxiv.org/abs/2503.20212).
 ## Setup
 ### Models
+There are 4 models in Dolphin, and 2 of them are available now. See details in [paper](https://arxiv.org/abs/2503.20212).
 |  Model  | Parameters | Average WER | Publicly Available |
 |:------:|:----------:|:------------------:|:------------------:|