AkitoP
/

whisper-large-v3-japense-phone_accent

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

AkitoP commited on Oct 15, 2024

Commit

9623385

·

verified ·

1 Parent(s): 02c78d0

Update README.md

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -9,4 +9,18 @@ metrics:
 base_model:
 - openai/whisper-large-v3-turbo
 library_name: transformers
----

 base_model:
 - openai/whisper-large-v3-turbo
 library_name: transformers
+---
+# Whisper Large V3 Japanese Phone Accent
+This is a Whisper model designed to transcribe Japanese speech into Katakana with pitch accent annotations. The model is built upon the whisper-large-v3-turbo and has been fine-tuned using a subset (1/20) of the Galgame-Speech dataset, as well as the jsut-5000 dataset.
+## Training Data:
+- **Stage 1**: Audio from the Galgame-Speech dataset was used. The text was converted into Katakana sequences with pitch accent annotations using pyopenjtalk.
+- **Stage 2**: JSUT-5000 dataset, using its original training set with pitch accent annotations. The data was split into 90% for training and 10% for evaluation.
+## Evaluation Results:
+- The model achieved a CER (Character Error Rate) of approximately 4% on the JSUT-5000 test set, which is an improvement over the 7% CER of pyopenjtalk.
+- Training only with Stage 1 resulted in a CER of 13%, with errors including specific misreadings and misclassification between on'yomi (音読) and kun'yomi (訓読) readings. This was improved in Stage 2.
+We are currently seeking Japanese pitch accent annotated datasets. If you have such data, please reach out!