gvlassis commited on
Commit
3d5971c
·
1 Parent(s): d7c0260

Added flag

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -6,7 +6,7 @@ tags: []
6
  # finewebedu_32000
7
 
8
  ## About
9
- 🪙 An English tokenizer, trained on the [FineWeb-Edu dataset](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
10
 
11
  ## Description
12
  This is a **character-level** (mainly) English (en) tokenizer, trained on the CC-MAIN-2024-10 subset of FineWeb-Edu. It has a vocabulary size of 32,000 ([multiple of 128](https://x.com/karpathy/status/1621578354024677377)), which makes it fast for integration in models.
 
6
  # finewebedu_32000
7
 
8
  ## About
9
+ 🇬🇧 An English tokenizer, trained on the [FineWeb-Edu dataset](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
10
 
11
  ## Description
12
  This is a **character-level** (mainly) English (en) tokenizer, trained on the CC-MAIN-2024-10 subset of FineWeb-Edu. It has a vocabulary size of 32,000 ([multiple of 128](https://x.com/karpathy/status/1621578354024677377)), which makes it fast for integration in models.