DataoceanAI commited on
Commit
b58e380
·
verified ·
1 Parent(s): c255ac1

init model

Browse files
Files changed (5) hide show
  1. README.md +74 -3
  2. base.pt +3 -0
  3. bpe.model +3 -0
  4. config.yaml +0 -0
  5. feats_stats.npz +3 -0
README.md CHANGED
@@ -1,3 +1,74 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dolphin
2
+
3
+ Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University. It supports 40 Eastern languages and has been trained on a large-scale dataset of 210,000 hours, which includes both DataoceanAI's proprietary datasets and open-source datasets. The model can perform speech recognition and language identification.
4
+
5
+ ## Approach
6
+
7
+ ![Mulitask data format](https://raw.githubusercontent.com/DataoceanAI/Dolphin/refs/heads/main/multitask-data-format.png)
8
+
9
+ Dolphin is built on Whisper and OWSM, using an attention-based encoder-decoder architecture. The encoder is Ebranchformer and the decoder is Transformer. Dolphin focuses on automatic speech recognition (ASR), its multitask data format is slightly different from Whisper's. Dolphin does not support Translation.
10
+ In addition,base on the characteristics of the DataocanAI dataset, Dolphin introduces region-specific tokens for different languages, enabling support for dialects.
11
+
12
+ ## Setup
13
+ Dolphin depends on ffmpeg to convert audio to WAV. If your OS does not have ffmpeg, please install it first.
14
+
15
+ ```shell
16
+ # Ubuntu or Debian
17
+ sudo apt update && sudo apt install ffmpeg
18
+
19
+ # MacOS
20
+ brew install ffmpeg
21
+
22
+ # Windows
23
+ choco install ffmpeg
24
+ ```
25
+
26
+ You can install the latest version of Dolphin using the following command:
27
+ ```shell
28
+ pip install -U dataoceanai-dolphin
29
+ ```
30
+
31
+ Additionally, it can also be installed from source using the following command:
32
+ ```shell
33
+ pip install git+https://github.com/SpeechOceanTech/Dolphin.git
34
+ ```
35
+
36
+ ## Available model and languages
37
+
38
+ ### Languages
39
+
40
+ Dolphin covers 40 [Eastern languages](./languages.md) and supports 22 Chinese dialects.
41
+
42
+ ## Usage
43
+
44
+ ### Command-line usage
45
+
46
+ ```shell
47
+ dolphin audio.wav
48
+
49
+ # Download model and specify the model path
50
+ dolphin audio.wav --model small --model_dir /data/models/dolphin/
51
+
52
+ # Specify language and region
53
+ dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"
54
+
55
+ # padding speech to 30 seconds
56
+ dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN" --padding_speech true
57
+ ```
58
+
59
+ #### Python usage
60
+
61
+ ```python
62
+ import dolphin
63
+
64
+ waveform = dolphin.load_audio("audio.wav")
65
+ model = dolphin.load_model("small", "/data/models/dolphin", "cuda")
66
+ result = model(waveform)
67
+ # Specify language and region
68
+ result = model(waveform, lang_sym="zh", region_sym="CN")
69
+ print(result.text)
70
+ ```
71
+
72
+ ## License
73
+
74
+ Dolphin's code and model weights are released under the Apache 2.0 License.
base.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:688f0cdb26da2684a4eec200a432091920287585e8e332507cbe9c1ab6d77401
3
+ size 561091890
bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b9102181ef1a2a3c42ce8fbca8a545ea4a55bce47ba7a5222951ab5bb21bb3c
3
+ size 854022
config.yaml ADDED
The diff for this file is too large to render. See raw diff
 
feats_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a37d00c07d595dbc2479b31be42b3c75de422469a947ce4b7bda193c3b1de7f
3
+ size 1402