matejulcar
commited on
Commit
•
49db152
1
Parent(s):
1d00622
added fast tokenizer
Browse files
README.md
CHANGED
@@ -9,10 +9,10 @@ Load in transformers library with:
|
|
9 |
```
|
10 |
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
11 |
|
12 |
-
tokenizer = AutoTokenizer.from_pretrained("EMBEDDIA/sloberta"
|
13 |
model = AutoModelForMaskedLM.from_pretrained("EMBEDDIA/sloberta")
|
14 |
```
|
15 |
-
|
16 |
|
17 |
# SloBERTa
|
18 |
SloBERTa model is a monolingual Slovene BERT-like model. It is closely related to French Camembert model https://camembert-model.fr/. The corpora used for training the model have 3.47 billion tokens in total. The subword vocabulary contains 32,000 tokens. The scripts and programs used for data preparation and training the model are available on https://github.com/clarinsi/Slovene-BERT-Tool
|
|
|
9 |
```
|
10 |
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
11 |
|
12 |
+
tokenizer = AutoTokenizer.from_pretrained("EMBEDDIA/sloberta")
|
13 |
model = AutoModelForMaskedLM.from_pretrained("EMBEDDIA/sloberta")
|
14 |
```
|
15 |
+
|
16 |
|
17 |
# SloBERTa
|
18 |
SloBERTa model is a monolingual Slovene BERT-like model. It is closely related to French Camembert model https://camembert-model.fr/. The corpora used for training the model have 3.47 billion tokens in total. The subword vocabulary contains 32,000 tokens. The scripts and programs used for data preparation and training the model are available on https://github.com/clarinsi/Slovene-BERT-Tool
|