daviddrzik
/

SK_Morph_BLM

slovak-language-model

Inference Endpoints

Model card Files Files and versions Community

daviddrzik commited on Sep 5, 2024

Commit

be16b9b

•

1 Parent(s): 2c3ee41

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ library_name: transformers
 ---
 # Slovak Morphological Baby Language Model (SK_Morph_BLM)
-**SK_Morph_BLM** is a pretrained small language model for the Slovak language, based on the RoBERTa architecture. The model utilizes a custom morphological tokenizer specifically designed for the Slovak language, which focuses on **preserving the integrity of root morphemes**. This tokenizer is not compatible with the standard `RobertaTokenizer` from the Hugging Face library due to its unique approach to tokenization. The model is case-insensitive, meaning it operates in lowercase. While the pretrained model can be used for masked language modeling, it is primarily intended for fine-tuning on downstream NLP tasks.
 ## How to Use the Model

 ---
 # Slovak Morphological Baby Language Model (SK_Morph_BLM)
+**SK_Morph_BLM** is a pretrained small language model for the Slovak language, based on the RoBERTa architecture. The model utilizes a custom morphological tokenizer (**SKMT**, more info [here](https://github.com/daviddrzik/Slovak_subword_tokenizers)) specifically designed for the Slovak language, which focuses on **preserving the integrity of root morphemes**. This tokenizer is not compatible with the standard `RobertaTokenizer` from the Hugging Face library due to its unique approach to tokenization. The model is case-insensitive, meaning it operates in lowercase. While the pretrained model can be used for masked language modeling, it is primarily intended for fine-tuning on downstream NLP tasks.
 ## How to Use the Model