Chat Template is broken?

#2
by Erland - opened

I tried deploying this model using llama-cpp-python, but it won't stop generating and it's going through the maximum tokens. I use chatml for the template since I can't use custom chat template for llama-cpp. I am new with llamacpp so maybe I am doing something wrong since it's outputing correctly using HuggingFace.

Thank you in advance!

I have no problem with this in LMStudio and ChatML. Which Q are you using? (I have had many issues with py llama.cpp chat templates before)

At first I am using the Q8, but then the not stopping error happens so I tried the fp16 but it still gives the same error.

I'll try just llama.cpp and will get back to you

This Q5 in LM Studio:
image.png

Clearly the quantized models stop, it's usually the application you use and how they handle stop token and chat template.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment