DeCRED-base

This is a 40M encoder-decoder Ebranchformer model trained with an decoder-centric regularization technique on 6,000 hours of open-source normalised English data.

Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.

Disclaimer: The model currently hallucinates on segments containing silence only, as it was previously not trained on such data. The fix will be added soon.

The model can be used with the pipeline class to transcribe audio files of arbitrary length.

from transformers import pipeline

model_id = "BUT-FIT/DeCRED-small"
pipe = pipeline("automatic-speech-recognition", model=model_id, feature_extractor=model_id, trust_remote_code=True)
# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
# The warning can be ignored.
pipe.type = "seq2seq"

# Run beam search decoding with joint CTC-attention scorer
result_beam = pipe("audio.wav")

# Run greedy decoding without joint CTC-attention scorer
pipe.model.generation_config.ctc_weight = 0.0
pipe.model.generation_config.num_beams = 1

result_greedy = pipe("audio.wav")
Downloads last month
42
Safetensors
Model size
39.8M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Datasets used to train BUT-FIT/DeCRED-small

Space using BUT-FIT/DeCRED-small 1

Collection including BUT-FIT/DeCRED-small

Evaluation results