Update README.md
Browse files
README.md
CHANGED
@@ -64,9 +64,7 @@ The model was trained on a POSMAC corpus. Polish Open Science Metadata Corpus (P
|
|
64 |
|
65 |
# Tokenizer
|
66 |
|
67 |
-
As in the original
|
68 |
-
|
69 |
-
We kindly encourage you to use the Fast version of the tokenizer, namely HerbertTokenizerFast.
|
70 |
|
71 |
# Usage
|
72 |
|
|
|
64 |
|
65 |
# Tokenizer
|
66 |
|
67 |
+
As in the original plT5 implementation, the training dataset was tokenized into subwords using a sentencepiece unigram model with vocabulary size of 50k tokens.
|
|
|
|
|
68 |
|
69 |
# Usage
|
70 |
|