license: mit | |
language: | |
- en | |
tags: | |
- text generation | |
datasets: | |
- fhswf/TinyStoriesV2_cleaned | |
BPE Tokenizer for TinyStoriesV2 | |
--- | |
Based on get-neo BPE Tokenizer, but with a smaller vocabulary. | |
Trained with TinyStoriesV2. | |
- Vocab Size: 4096 | |
- 256 Base chars | |
- 1 extra Token: <|endoftext|> | |
- 3839 merges |