llm-jp
/

llm-jp-13b-v2.0

Text Generation

text-generation-inference

Model card Files Files and versions Community

tathi commited on Apr 29, 2024

Commit

757f2f5

·

verified ·

1 Parent(s): ee3f84b

fix tokenizer explanation

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -94,7 +94,7 @@ print(tokenizer.decode(output))
 ## Tokenizer
 The tokenizer of this model is based on [huggingface/tokenizers](https://github.com/huggingface/tokenizers) Unigram byte-fallback model.
-The vocabulary entries were converted from [`llm-jp-tokenizer v2.2 (50k)`](https://github.com/llm-jp/llm-jp-tokenizer/releases/tag/v2.2).
 Please refer to [README.md](https://github.com/llm-jp/llm-jp-tokenizer) of `llm-ja-tokenizer` for details on the vocabulary construction procedure (the pure SentencePiece training does not reproduce our vocabulary).
 - **Model:** Hugging Face Fast Tokenizer using Unigram byte-fallback model which requires `tokenizers>=0.14.0`

 ## Tokenizer
 The tokenizer of this model is based on [huggingface/tokenizers](https://github.com/huggingface/tokenizers) Unigram byte-fallback model.
+The vocabulary entries were converted from [`llm-jp-tokenizer v2.2 (100k: code20K_en40K_ja60K.ver2.2)`](https://github.com/llm-jp/llm-jp-tokenizer/releases/tag/v2.2).
 Please refer to [README.md](https://github.com/llm-jp/llm-jp-tokenizer) of `llm-ja-tokenizer` for details on the vocabulary construction procedure (the pure SentencePiece training does not reproduce our vocabulary).
 - **Model:** Hugging Face Fast Tokenizer using Unigram byte-fallback model which requires `tokenizers>=0.14.0`