Files changed (1) hide show
  1. README.md +27 -10
README.md CHANGED
@@ -1,16 +1,22 @@
1
  # Dolphin
2
 
3
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University. It supports 40 Eastern languages and has been trained on a large-scale dataset of 210,000 hours, which includes both DataoceanAI's proprietary datasets and open-source datasets. The model can perform speech recognition and language identification.
 
 
 
 
 
4
 
5
  ## Approach
6
 
7
- ![Mulitask data format](https://raw.githubusercontent.com/DataoceanAI/Dolphin/refs/heads/main/multitask-data-format.png)
 
 
 
8
 
9
- Dolphin is built on Whisper and OWSM, using an attention-based encoder-decoder architecture. The encoder is Ebranchformer and the decoder is Transformer. Dolphin focuses on automatic speech recognition (ASR), its multitask data format is slightly different from Whisper's. Dolphin does not support Translation.
10
- In addition,base on the characteristics of the DataocanAI dataset, Dolphin introduces region-specific tokens for different languages, enabling support for dialects.
11
 
12
  ## Setup
13
- Dolphin depends on ffmpeg to convert audio to WAV. If your OS does not have ffmpeg, please install it first.
14
 
15
  ```shell
16
  # Ubuntu or Debian
@@ -28,16 +34,27 @@ You can install the latest version of Dolphin using the following command:
28
  pip install -U dataoceanai-dolphin
29
  ```
30
 
31
- Additionally, it can also be installed from source using the following command:
32
  ```shell
33
  pip install git+https://github.com/SpeechOceanTech/Dolphin.git
34
  ```
35
 
36
- ## Available model and languages
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ### Languages
39
 
40
- Dolphin covers 40 [Eastern languages](./languages.md) and supports 22 Chinese dialects.
41
 
42
  ## Usage
43
 
@@ -56,7 +73,7 @@ dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh
56
  dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN" --padding_speech true
57
  ```
58
 
59
- #### Python usage
60
 
61
  ```python
62
  import dolphin
@@ -71,4 +88,4 @@ print(result.text)
71
 
72
  ## License
73
 
74
- Dolphin's code and model weights are released under the Apache 2.0 License.
 
1
  # Dolphin
2
 
3
+ [Paper]
4
+ [Github](https://github.com/DataoceanAI/Dolphin)
5
+ [Huggingface](https://huggingface.co/DataoceanAI)
6
+ [Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
7
+
8
+ Dolphin is a multilingual, multitask ASR model developed through a collaboration between Dataocean AI and Tsinghua University. It supports 40 Eastern languages across East Asia, South Asia, Southeast Asia, and the Middle East, while also supporting 22 Chinese dialects. It is trained on over 210,000 hours of data, which includes both DataoceanAI's proprietary datasets and open-source datasets. The model can perform speech recognition, voice activity detection (VAD), segmentation, and language identification (LID).
9
 
10
  ## Approach
11
 
12
+ ![Mulitask data format](https://raw.githubusercontent.com/DataoceanAI/Dolphin/refs/heads/main/figures/multitask-data-format.png)
13
+ Dolphin largely follows the innovative design approach of [Whisper](https://github.com/openai/whisper) and [OWSM](https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1). A joint CTC-Attention architecture is adopted, with encoder based on E-Branchformer and decoder based on standard Transformer. Several key modifications are introduced for its specific focus on ASR. Dolphin does not support translation tasks, and eliminates the use of previous text and its related tokens.
14
+
15
+ A significant enhancement in Dolphin is the introduction of a two-level language token system to better handle linguistic and regional diversity, especially in Dataocean AI dataset. The first token specifies the language (e.g., `<zh>`, `<ja>`), while the second token indicates the region (e.g., `<CN>`, `<JP>`). See details in [paper].
16
 
 
 
17
 
18
  ## Setup
19
+ Dolphin requires FFmpeg to convert audio file to WAV format. If FFmpeg is not installed on your system, please install it first:
20
 
21
  ```shell
22
  # Ubuntu or Debian
 
34
  pip install -U dataoceanai-dolphin
35
  ```
36
 
37
+ Alternatively, it can also be installed from the source:
38
  ```shell
39
  pip install git+https://github.com/SpeechOceanTech/Dolphin.git
40
  ```
41
 
42
+ ## Available Models and Languages
43
+
44
+ ### Models
45
+
46
+ There are 4 models in Dolphin, and 2 of them are available now. See details in [paper].
47
+
48
+ | Model | Parameters | Average WER | Publicly Available |
49
+ |:------:|:----------:|:------------------:|:------------------:|
50
+ | base | 140 M | 33.3 | ✅ |
51
+ | small | 372 M | 25.2 | ✅ |
52
+ | medium | 910 M | 23.1 | |
53
+ | large | 1679 M | 21.6 | |
54
 
55
  ### Languages
56
 
57
+ Dolphin supports 40 Eastern languages and 22 Chinese dialects. For a complete list of supported languages, see [languages.md](https://github.com/DataoceanAI/Dolphin/blob/main/languages.md).
58
 
59
  ## Usage
60
 
 
73
  dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN" --padding_speech true
74
  ```
75
 
76
+ ### Python usage
77
 
78
  ```python
79
  import dolphin
 
88
 
89
  ## License
90
 
91
+ Dolphin's code and model weights are released under the [Apache 2.0 License](https://github.com/DataoceanAI/Dolphin/blob/main/LICENSE).