Added flag
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ tags: []
|
|
6 |
# finewebedu_32000
|
7 |
|
8 |
## About
|
9 |
-
|
10 |
|
11 |
## Description
|
12 |
This is a **character-level** (mainly) English (en) tokenizer, trained on the CC-MAIN-2024-10 subset of FineWeb-Edu. It has a vocabulary size of 32,000 ([multiple of 128](https://x.com/karpathy/status/1621578354024677377)), which makes it fast for integration in models.
|
|
|
6 |
# finewebedu_32000
|
7 |
|
8 |
## About
|
9 |
+
🇬🇧 An English tokenizer, trained on the [FineWeb-Edu dataset](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
|
10 |
|
11 |
## Description
|
12 |
This is a **character-level** (mainly) English (en) tokenizer, trained on the CC-MAIN-2024-10 subset of FineWeb-Edu. It has a vocabulary size of 32,000 ([multiple of 128](https://x.com/karpathy/status/1621578354024677377)), which makes it fast for integration in models.
|