π¨π¨π¨ ATTENTION! π¨π¨π¨
Use an updated model: https://huggingface.co/Yehor/w2v-bert-uk-v2.1
w2v-bert-uk v1
Community
- Discord: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk
Google Colab
You can run this model using a Google Colab notebook: https://colab.research.google.com/drive/1QoKw2DWo5a5XYw870cfGE3dJf1WjZgrj?usp=sharing
Metrics
- AM:
- WER: 0.0727
- CER: 0.0151
- Accuracy: 92.73%
- AM + LM:
- WER: 0.0655
- CER: 0.0139
- Accuracy: 93.45%
Hyperparameters
This model was trained with the following hparams using 2 RTX A4000:
torchrun --standalone --nnodes=1 --nproc-per-node=2 ../train_w2v2_bert.py \
--custom_set ~/cv10/train.csv \
--custom_set_eval ~/cv10/test.csv \
--num_train_epochs 15 \
--tokenize_config . \
--w2v2_bert_model facebook/w2v-bert-2.0 \
--batch 4 \
--num_proc 5 \
--grad_accum 1 \
--learning_rate 3e-5 \
--logging_steps 20 \
--eval_step 500 \
--group_by_length \
--attention_dropout 0.0 \
--activation_dropout 0.05 \
--feat_proj_dropout 0.05 \
--feat_quantizer_dropout 0.0 \
--hidden_dropout 0.05 \
--layerdrop 0.0 \
--final_dropout 0.0 \
--mask_time_prob 0.0 \
--mask_time_length 10 \
--mask_feature_prob 0.0 \
--mask_feature_length 10
Usage
# pip install -U torch soundfile transformers
import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor
# Config
model_name = 'Yehor/w2v-bert-2.0-uk'
device = 'cuda:1' # or cpu
sampling_rate = 16_000
# Load the model
asr_model = AutoModelForCTC.from_pretrained(model_name).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)
paths = [
'sample1.wav',
]
# Extract audio
audio_inputs = []
for path in paths:
audio_input, _ = sf.read(path)
audio_inputs.append(audio_input)
# Transcribe the audio
inputs = processor(audio_inputs, sampling_rate=sampling_rate).input_features
features = torch.tensor(inputs).to(device)
with torch.no_grad():
logits = asr_model(features).logits
predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids)
# Log results
print('Predictions:')
print(predictions)
- Downloads last month
- 37
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Yehor/w2v-bert-uk
Base model
facebook/w2v-bert-2.0