Tiktoken cl100k_base/gpt4 Tokenizer

Convert script

modify from https://gist.github.com/xenova/a452a6474428de0182b17605a98631ee

Example usage:

import transformers

tokenizer = transformers.AutoTokenizer.from_pretrained("DWDMaiMai/tiktoken_cl100k_base")
assert [15339, 1917, 0] == tokenizer.encode("hello world!")

messages = [
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
    {"role": "user", "content": "I'd like to show off how chat templating works!"},
]
assert """<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing great. How can I help you today?<|im_end|>
<|im_start|>user
I'd like to show off how chat templating works!<|im_end|>
<|im_start|>assistant
""" == tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

Relevant

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.