|
--- |
|
extra_gated_prompt: "This is a BETA-model. To use this model, you agree on the [licensing terms](license.md)." |
|
language: |
|
- 'no' |
|
license: apache-2.0 |
|
tags: |
|
- audio |
|
- asr |
|
- automatic-speech-recognition |
|
- hf-asr-leaderboard |
|
model-index: |
|
- name: tiny_scream_april_beta |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information Keras had access to. You should |
|
probably proofread and complete it, then remove this comment. --> |
|
|
|
# tiny_scream_april_beta |
|
|
|
This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the NbAiLab/NCC_speech_all_v5 dataset. It uses a beam size of 5. |
|
|
|
## Model description |
|
|
|
This is a BETA version. You need to accept [the terms and conditons](license.md) to use it. |
|
|
|
## Using the Model |
|
There are several ways of using this model, and we do hope people will convert it into different formats. The code below allows you to process long files with Transformers.: |
|
|
|
```python |
|
import torch |
|
import numpy as np |
|
import librosa |
|
from transformers import pipeline |
|
|
|
# Try using "mps" for Metal (Mac), "cuda" if you have GPU, and "cpu" if not |
|
device = torch.device("cuda") |
|
|
|
pipe = pipeline("automatic-speech-recognition", |
|
model="NbAiLab/tiny_scream_april_beta", |
|
chunk_length_s=30, |
|
device=device, |
|
max_new_tokens=128, |
|
generate_kwargs={"language": "", "task": "transcribe"}) |
|
|
|
# Load the WAV file. Modify this to use mp3 instead |
|
audio_path = 'myfile.wav' |
|
samples, sample_rate = librosa.load(audio_path, sr=16000, mono=True) |
|
|
|
# Run the pipeline |
|
prediction = pipe(samples)["text"] |
|
|
|
print(prediction) |
|
|
|
``` |
|
|
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 8e-05 |
|
- lr_scheduler_type: linear |
|
- per_device_train_batch_size: 48 |
|
- total_train_batch_size_per_node: 192 |
|
- total_train_batch_size: 1536 |
|
- total_optimization_steps: 50000 |
|
- starting_optimization_step: None |
|
- finishing_optimization_step: 50000 |
|
- num_train_dataset_workers: 64 |
|
- total_num_training_examples: 76800000 |
|
|
|
### Training results |
|
|
|
| step | eval_loss | train_loss | eval_wer | eval_cer | |
|
|:-----:|:---------:|:----------:|:--------:|:--------:| |
|
| 0 | 2.1853 | 2.6128 | 225.2741 | 151.0305 | |
|
| 2500 | 0.8090 | 0.6776 | 26.0049 | 10.4006 | |
|
| 5000 | 0.5674 | 0.5277 | 20.7674 | 8.7327 | |
|
| 7500 | 0.5255 | 0.4551 | 19.3971 | 8.5059 | |
|
| 10000 | 0.5774 | 0.4327 | 18.0877 | 8.0272 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.28.0.dev0 |
|
- Datasets 2.11.0 |
|
- Tokenizers 0.13.2 |
|
|