DataoceanAI
/

dolphin-small

Automatic Speech Recognition

multilingual

Model card Files Files and versions Community

Update README.md

by wgb14 - opened 6 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+27

-10

Files changed (1) hide show

README.md +27 -10

README.md CHANGED Viewed

@@ -1,16 +1,22 @@
 # Dolphin
-Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University. It supports 40 Eastern languages and has been trained on a large-scale dataset of 210,000 hours, which includes both DataoceanAI's proprietary datasets and open-source datasets. The model can perform speech recognition and language identification.
 ## Approach
-![Mulitask data format](https://raw.githubusercontent.com/DataoceanAI/Dolphin/refs/heads/main/multitask-data-format.png)
-Dolphin is built on Whisper and OWSM, using an attention-based encoder-decoder architecture. The encoder is Ebranchformer and the decoder is Transformer. Dolphin focuses on automatic speech recognition (ASR), its multitask data format is slightly different from Whisper's. Dolphin does not support Translation.
-In addition，base on the characteristics of the DataocanAI dataset, Dolphin introduces region-specific tokens for different languages, enabling support for dialects.
 ## Setup
-Dolphin depends on ffmpeg to convert audio to WAV. If your OS does not have ffmpeg, please install it first.
 ```shell
 # Ubuntu or Debian
@@ -28,16 +34,27 @@ You can install the latest version of Dolphin using the following command:
 pip install -U dataoceanai-dolphin
 ```
-Additionally, it can also be installed from source using the following command:
 ```shell
 pip install git+https://github.com/SpeechOceanTech/Dolphin.git
 ```
-## Available model and languages
 ### Languages
-Dolphin covers 40 [Eastern languages](./languages.md) and supports 22 Chinese dialects.
 ## Usage
@@ -56,7 +73,7 @@ dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh
 dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN" --padding_speech true
 ```
-#### Python usage
 ```python
 import dolphin
@@ -71,4 +88,4 @@ print(result.text)
 ## License
-Dolphin's code and model weights are released under the Apache 2.0 License.

 # Dolphin
+[Paper]
+[Github](https://github.com/DataoceanAI/Dolphin)
+[Huggingface](https://huggingface.co/DataoceanAI)
+[Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
+Dolphin is a multilingual, multitask ASR model developed through a collaboration between Dataocean AI and Tsinghua University. It supports 40 Eastern languages across East Asia, South Asia, Southeast Asia, and the Middle East, while also supporting 22 Chinese dialects. It is trained on over 210,000 hours of data, which includes both DataoceanAI's proprietary datasets and open-source datasets. The model can perform speech recognition, voice activity detection (VAD), segmentation, and language identification (LID).
 ## Approach
+![Mulitask data format](https://raw.githubusercontent.com/DataoceanAI/Dolphin/refs/heads/main/figures/multitask-data-format.png)
+Dolphin largely follows the innovative design approach of [Whisper](https://github.com/openai/whisper) and [OWSM](https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1). A joint CTC-Attention architecture is adopted, with encoder based on E-Branchformer and decoder based on standard Transformer. Several key modifications are introduced for its specific focus on ASR. Dolphin does not support translation tasks, and eliminates the use of previous text and its related tokens.
+A significant enhancement in Dolphin is the introduction of a two-level language token system to better handle linguistic and regional diversity, especially in Dataocean AI dataset. The first token specifies the language (e.g., `<zh>`, `<ja>`), while the second token indicates the region (e.g., `<CN>`, `<JP>`). See details in [paper].
 ## Setup
+Dolphin requires FFmpeg to convert audio file to WAV format. If FFmpeg is not installed on your system, please install it first:
 ```shell
 # Ubuntu or Debian
 pip install -U dataoceanai-dolphin
 ```
+Alternatively, it can also be installed from the source:
 ```shell
 pip install git+https://github.com/SpeechOceanTech/Dolphin.git
 ```
+## Available Models and Languages
+### Models
+There are 4 models in Dolphin, and 2 of them are available now. See details in [paper].
+|  Model  | Parameters | Average WER | Publicly Available |
+|:------:|:----------:|:------------------:|:------------------:|
+|  base  |    140 M    |     33.3      |      ✅        |
+| small  |   372 M    |     25.2     |      ✅       |
+| medium |   910 M    |    23.1     |            |
+| large  |   1679 M   |        21.6         |             |
 ### Languages
+Dolphin supports 40 Eastern languages and 22 Chinese dialects. For a complete list of supported languages, see [languages.md](https://github.com/DataoceanAI/Dolphin/blob/main/languages.md).
 ## Usage
 dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN" --padding_speech true
 ```
+### Python usage
 ```python
 import dolphin
 ## License
+Dolphin's code and model weights are released under the [Apache 2.0 License](https://github.com/DataoceanAI/Dolphin/blob/main/LICENSE).