kotoba-tech
/

kotoba-whisper-v1.0-faster

Automatic Speech Recognition

Model card Files Files and versions

asahi417 commited on May 8, 2024

Commit

1482df5

·

verified ·

1 Parent(s): 5417738

Create benchmark.sh

Files changed (1) hide show

benchmark.sh +23 -0

benchmark.sh ADDED Viewed

	@@ -0,0 +1,23 @@

+# clone dataset
+git clone https://huggingface.co/datasets/kotoba-tech/kotoba-whisper-eval
+# convert to 16khz
+ffmpeg -i kotoba-whisper-eval/audio/long_interview_1.mp3 -ar 16000 -ac 1 -c:a pcm_s16le kotoba-whisper-eval/audio/long_interview_1.wav
+ffmpeg -i kotoba-whisper-eval/audio/manzai1.mp3 -ar 16000 -ac 1 -c:a pcm_s16le kotoba-whisper-eval/audio/manzai1.wav
+ffmpeg -i kotoba-whisper-eval/audio/manzai2.mp3 -ar 16000 -ac 1 -c:a pcm_s16le kotoba-whisper-eval/audio/manzai2.wav
+ffmpeg -i kotoba-whisper-eval/audio/manzai3.mp3 -ar 16000 -ac 1 -c:a pcm_s16le kotoba-whisper-eval/audio/manzai3.wav
+# cache the model
+python -c 'from faster_whisper import WhisperModel; model = WhisperModel("kotoba-tech/kotoba-whisper-v1.0-faster")'
+SECONDS=0
+python -c 'from faster_whisper import WhisperModel; model = WhisperModel("kotoba-tech/kotoba-whisper-v1.0-faster"); segments=model.transcribe("kotoba-whisper-eval/audio/long_interview_1.wav", language="ja", chunk_length=15, condition_on_previous_text=False); for segment in segments:print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))'
+TIME_INTERVIEW=$SECONDS
+SECONDS=0
+python -c 'from faster_whisper import WhisperModel; model = WhisperModel("kotoba-tech/kotoba-whisper-v1.0-faster"); model.transcribe("kotoba-whisper-eval/audio/manzai1.wav", language="ja", chunk_length=15, condition_on_previous_text=False)"'
+TIME_MANZAI1=$SECONDS
+SECONDS=0
+python -c 'from faster_whisper import WhisperModel; model = WhisperModel("kotoba-tech/kotoba-whisper-v1.0-faster"); model.transcribe("kotoba-whisper-eval/audio/manzai2.wav", language="ja", chunk_length=15, condition_on_previous_text=False)"'
+TIME_MANZAI2=$SECONDS
+SECONDS=0
+python -c 'from faster_whisper import WhisperModel; model = WhisperModel("kotoba-tech/kotoba-whisper-v1.0-faster"); model.transcribe("kotoba-whisper-eval/audio/manzai3.wav", language="ja", chunk_length=15, condition_on_previous_text=False)"'
+TIME_MANZAI3=$SECONDS