Text Generation
Transformers
Safetensors
English
olmo2
conversational
Inference Endpoints

What is that instruction template?

#1
by SerialKicked - opened

What is that instruction template? It makes very little sense. Your model has ChatML being fully tokenized but you don't even use it, instead you use non tokenized markers. It has only 4096 context length AND you're wasting half on it on the instruction template? I don't get it.

Hey @SerialKicked , the current template comes from Tulu 1 and doesn’t use custom chat tokens to avoid modifying the tokenizer during training, which keeps things simpler. While the template is lightweight (~5 tokens per turn), we’re open to exploring optimizations, including custom chat tokens, in the future.

natolambert changed discussion status to closed

Sign up or log in to comment