What is that instruction template?
#1
by
SerialKicked
- opened
What is that instruction template? It makes very little sense. Your model has ChatML being fully tokenized but you don't even use it, instead you use non tokenized markers. It has only 4096 context length AND you're wasting half on it on the instruction template? I don't get it.
Hey @SerialKicked , the current template comes from Tulu 1 and doesn’t use custom chat tokens to avoid modifying the tokenizer during training, which keeps things simpler. While the template is lightweight (~5 tokens per turn), we’re open to exploring optimizations, including custom chat tokens, in the future.
natolambert
changed discussion status to
closed