NbAiLabArchive
/

tiny_scream_april_beta

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Metrics Training metrics Community

tiny_scream_april_beta / README.md

pere's picture

Update README.md

23b756f over 2 years ago

|

history blame contribute delete

2.57 kB

	---
	extra_gated_prompt: "This is a BETA-model. To use this model, you agree on the [licensing terms](license.md)."
	language:
	- 'no'
	license: apache-2.0
	tags:
	- audio
	- asr
	- automatic-speech-recognition
	- hf-asr-leaderboard
	model-index:
	- name: tiny_scream_april_beta
	results: []
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# tiny_scream_april_beta

	This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the NbAiLab/NCC_speech_all_v5 dataset. It uses a beam size of 5.

	## Model description

	This is a BETA version. You need to accept [the terms and conditons](license.md) to use it.

	## Using the Model
	There are several ways of using this model, and we do hope people will convert it into different formats. The code below allows you to process long files with Transformers.:

	```python
	import torch
	import numpy as np
	import librosa
	from transformers import pipeline

	# Try using "mps" for Metal (Mac), "cuda" if you have GPU, and "cpu" if not
	device = torch.device("cuda")

	pipe = pipeline("automatic-speech-recognition",
	model="NbAiLab/tiny_scream_april_beta",
	chunk_length_s=30,
	device=device,
	max_new_tokens=128,
	generate_kwargs={"language": "", "task": "transcribe"})

	# Load the WAV file. Modify this to use mp3 instead
	audio_path = 'myfile.wav'
	samples, sample_rate = librosa.load(audio_path, sr=16000, mono=True)

	# Run the pipeline
	prediction = pipe(samples)["text"]

	print(prediction)

	```


	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-05
	- lr_scheduler_type: linear
	- per_device_train_batch_size: 48
	- total_train_batch_size_per_node: 192
	- total_train_batch_size: 1536
	- total_optimization_steps: 50000
	- starting_optimization_step: None
	- finishing_optimization_step: 50000
	- num_train_dataset_workers: 64
	- total_num_training_examples: 76800000

	### Training results

	\| step \| eval_loss \| train_loss \| eval_wer \| eval_cer \|
	\|:-----:\|:---------:\|:----------:\|:--------:\|:--------:\|
	\| 0 \| 2.1853 \| 2.6128 \| 225.2741 \| 151.0305 \|
	\| 2500 \| 0.8090 \| 0.6776 \| 26.0049 \| 10.4006 \|
	\| 5000 \| 0.5674 \| 0.5277 \| 20.7674 \| 8.7327 \|
	\| 7500 \| 0.5255 \| 0.4551 \| 19.3971 \| 8.5059 \|
	\| 10000 \| 0.5774 \| 0.4327 \| 18.0877 \| 8.0272 \|


	### Framework versions

	- Transformers 4.28.0.dev0
	- Datasets 2.11.0
	- Tokenizers 0.13.2