Update README.md
Browse files
README.md
CHANGED
@@ -8,5 +8,4 @@ language:
|
|
8 |
|
9 |
BPE Tokenizer fitted on a custom corpus, with digit separation, byte fallback and other features from LlamaTokenizer.
|
10 |
|
11 |
-
Only fitted on
|
12 |
-
# Warning - Dataset was not shuffled so fitted on code only, not usable as is !
|
|
|
8 |
|
9 |
BPE Tokenizer fitted on a custom corpus, with digit separation, byte fallback and other features from LlamaTokenizer.
|
10 |
|
11 |
+
Only fitted on 1,000,000 samples (7.5M words).
|
|