Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ datasets:
|
|
26 |
# Kotoba-Whisper-v2.1
|
27 |
_Kotoba-Whisper-v2.1_ is a Japanese ASR model based on [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0), with
|
28 |
additional postprocessing stacks integrated as [`pipeline`](https://huggingface.co/docs/transformers/en/main_classes/pipelines). The new features includes
|
29 |
-
|
30 |
These libraries are merged into Kotoba-Whisper-v2.1 via pipeline and will be applied seamlessly to the predicted transcription from [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0).
|
31 |
The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
|
32 |
|
@@ -38,15 +38,9 @@ along with the.
|
|
38 |
| model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
|
39 |
|:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
|
40 |
| [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 17.6 | 15.4 | 17.4 |
|
41 |
-
| [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1)
|
42 |
-
| [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (punctuator + stable-ts) | 17.7 | 15.4 | 17 |
|
43 |
-
| [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (punctuator) | 17.7 | 15.4 | 17 |
|
44 |
-
| [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (stable-ts) | 17.7 | 15.4 | 17 |
|
45 |
| [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 17.8 | 15.2 | 17.8 |
|
46 |
-
| [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1)
|
47 |
-
| [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (punctuator + stable-ts) | 17.9 | 15 | 17.8 |
|
48 |
-
| [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (punctuator) | 17.9 | 15 | 17.8 |
|
49 |
-
| [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (stable-ts) | 17.9 | 15 | 17.8 |
|
50 |
| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 15.3 | 13.4 | 20.5 |
|
51 |
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 15.9 | 10.6 | 34.6 |
|
52 |
| [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 16.6 | 11.3 | 40.7 |
|
@@ -97,7 +91,6 @@ pipe = pipeline(
|
|
97 |
chunk_length_s=15,
|
98 |
batch_size=16,
|
99 |
trust_remote_code=True,
|
100 |
-
stable_ts=True,
|
101 |
punctuator=True
|
102 |
)
|
103 |
|
@@ -116,12 +109,6 @@ print(result)
|
|
116 |
+ result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
|
117 |
```
|
118 |
|
119 |
-
- To deactivate stable-ts:
|
120 |
-
```diff
|
121 |
-
- stable_ts=True,
|
122 |
-
+ stable_ts=False,
|
123 |
-
```
|
124 |
-
|
125 |
- To deactivate punctuator:
|
126 |
```diff
|
127 |
- punctuator=True,
|
|
|
26 |
# Kotoba-Whisper-v2.1
|
27 |
_Kotoba-Whisper-v2.1_ is a Japanese ASR model based on [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0), with
|
28 |
additional postprocessing stacks integrated as [`pipeline`](https://huggingface.co/docs/transformers/en/main_classes/pipelines). The new features includes
|
29 |
+
adding punctuation with [punctuators](https://github.com/1-800-BAD-CODE/punctuators/tree/main).
|
30 |
These libraries are merged into Kotoba-Whisper-v2.1 via pipeline and will be applied seamlessly to the predicted transcription from [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0).
|
31 |
The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
|
32 |
|
|
|
38 |
| model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
|
39 |
|:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
|
40 |
| [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 17.6 | 15.4 | 17.4 |
|
41 |
+
| [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) | 17.7 | 15.4 | 17 | -->
|
|
|
|
|
|
|
42 |
| [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 17.8 | 15.2 | 17.8 |
|
43 |
+
| [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) | 17.9 | 15 | 17.8 |
|
|
|
|
|
|
|
44 |
| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 15.3 | 13.4 | 20.5 |
|
45 |
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 15.9 | 10.6 | 34.6 |
|
46 |
| [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 16.6 | 11.3 | 40.7 |
|
|
|
91 |
chunk_length_s=15,
|
92 |
batch_size=16,
|
93 |
trust_remote_code=True,
|
|
|
94 |
punctuator=True
|
95 |
)
|
96 |
|
|
|
109 |
+ result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
|
110 |
```
|
111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
- To deactivate punctuator:
|
113 |
```diff
|
114 |
- punctuator=True,
|