manu commited on
Commit
13721b2
·
1 Parent(s): f8718a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -8,5 +8,4 @@ language:
8
 
9
  BPE Tokenizer fitted on a custom corpus, with digit separation, byte fallback and other features from LlamaTokenizer.
10
 
11
- Only fitted on 100,000 samples (7.5M words).
12
- # Warning - Dataset was not shuffled so fitted on code only, not usable as is !
 
8
 
9
  BPE Tokenizer fitted on a custom corpus, with digit separation, byte fallback and other features from LlamaTokenizer.
10
 
11
+ Only fitted on 1,000,000 samples (7.5M words).