Update README.md
Browse files
README.md
CHANGED
@@ -9,4 +9,18 @@ metrics:
|
|
9 |
base_model:
|
10 |
- openai/whisper-large-v3-turbo
|
11 |
library_name: transformers
|
12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
base_model:
|
10 |
- openai/whisper-large-v3-turbo
|
11 |
library_name: transformers
|
12 |
+
---
|
13 |
+
|
14 |
+
# Whisper Large V3 Japanese Phone Accent
|
15 |
+
|
16 |
+
This is a Whisper model designed to transcribe Japanese speech into Katakana with pitch accent annotations. The model is built upon the whisper-large-v3-turbo and has been fine-tuned using a subset (1/20) of the Galgame-Speech dataset, as well as the jsut-5000 dataset.
|
17 |
+
|
18 |
+
## Training Data:
|
19 |
+
- **Stage 1**: Audio from the Galgame-Speech dataset was used. The text was converted into Katakana sequences with pitch accent annotations using pyopenjtalk.
|
20 |
+
- **Stage 2**: JSUT-5000 dataset, using its original training set with pitch accent annotations. The data was split into 90% for training and 10% for evaluation.
|
21 |
+
|
22 |
+
## Evaluation Results:
|
23 |
+
- The model achieved a CER (Character Error Rate) of approximately 4% on the JSUT-5000 test set, which is an improvement over the 7% CER of pyopenjtalk.
|
24 |
+
- Training only with Stage 1 resulted in a CER of 13%, with errors including specific misreadings and misclassification between on'yomi (音読) and kun'yomi (訓読) readings. This was improved in Stage 2.
|
25 |
+
|
26 |
+
We are currently seeking Japanese pitch accent annotated datasets. If you have such data, please reach out!
|