--- library_name: transformers license: apache-2.0 pipeline_tag: automatic-speech-recognition --- # Korla/Wav2Vec2BertForCTC-hsb ## Model Description **Wav2Vec2BertForCTC-hsb** is a fine-tuned [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) model with a BERT-style character classification head, adapted for **Upper Sorbian** automatic speech recognition (ASR). This model has been fine-tuned for CTC (Connectionist Temporal Classification) loss and is capable of transcribing audio in the Upper Sorbian language. ## Usage This model can be used for speech-to-text tasks on Upper Sorbian audio. An optional **5-gram language model (`5gram.bin`)** is provided for decoding with an external LM scorer. This n-gram model was trained on a corpus of **Upper Sorbian Holy Masses**, which can help improve decoding accuracy for religious or formal speech domains. ## Training Data The model was fine-tuned on a dataset provided by the **Foundation for the Sorbian People**, which consists of high-quality recordings and transcripts in Upper Sorbian. The dataset includes diverse speakers and speech conditions, ensuring a robust acoustic model. ## Language Model - **Name:** `5gram.bin` - **Type:** 5-gram character-level KenLM language model - **Domain:** Upper Sorbian religious speech (Holy Masses) - **Usage:** For decoding with tools such as [CTCDecoder](https://github.com/parlance/ctcdecode). ## Limitations - The model's accuracy may degrade on informal or highly dialectal speech not represented in the training data. - The language model is domain-specific (religious speech) and may bias decoding toward that context. - The model supports only **Upper Sorbian**, not Lower Sorbian or other Slavic languages. ## How to Use For normal use (without LM) you can load the model into a pipeline. To use the 5-gram language model for decoding, use the pyctcdecode library. ## Citation Please cite as: ```bibtex @misc{korla_wav2vec2bertforctc_hsb, author = {Karl Baier}, title = {Wav2Vec2BertForCTC-hsb}, year = {2025}, howpublished = {\url{https://huggingface.co/Korla/Wav2Vec2BertForCTC-hsb}}, } ```