Chat template with system prompt

by tattrongvu - opened 30 days ago

30 days ago

•

Hi, thanks for your great models.
For some one who want to use it with OpenAI compatible API (using vLLM) you may need to modify & provide the chat template in the vllm serve command, otherwise you need to provide it in the chat_template args IN THE REQUEST directly.
The default chat template doesn't support system prompt.
Here is the modified one that allow user to concatenate the default German system prompt with the user provided system prompt.

{%- set default_system_prompt = 'Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.' %}
{%- for message in messages %}
{%- if loop.first and message['role']|lower != 'system' %}
{{- 'System: ' + default_system_prompt + '\\n' }}
{%- endif %}
{%- if message['role']|lower == 'user' %}
{{- message['role']|capitalize + ': ' + message['content'] + '\\n' }}
{%- elif message['role']|lower == 'assistant' %}
{{- message['role']|capitalize + ': ' + message['content'] + eos_token + '\\n' }}
{%- elif message['role']|lower == 'system' %}
{{- 'System: ' + default_system_prompt + ' ' + message['content'] + '\\n' }}
{%- endif %}
{%- endfor %}
{%-if add_generation_prompt %}
{{- 'Assistant: '}}
{%- endif %}

Note that by modified that chat template, the behavior of the model might be changed as well. Use with caution!
Verify with this code:

messages = [{"role": "system", "content": "You are a helpful assistant named Teuken."},{"role": "User", "content": "Wer bist du?"},{"role": "Assistant", "content": "Ich bin Teuken."},{"role": "User", "content": "Wann bist du geboren?"}]
tokenized_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(tokenized_prompt)

It will output:

System: Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen. You are a helpful assistant named Teuken.
User: Wer bist du?
Assistant: Ich bin Teuken.</s>
User: Wann bist du geboren?
Assistant:

danielsteinigen

OpenGPT-X org 28 days ago

•

edited 28 days ago

Hi @tattrongvu , thanks for your investigations. We now added a default system prompt, which is used, if no chat_template language is provided: https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4/commit/2fcfcda2db6e3529f3c7676fd5a41675deaecd37

We also added some sample for the usage with vLLM to the Readme: https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4#usage-with-vllm-server

lennex-ai

22 days ago

•

edited 10 days ago

Please revisit this again. As of now, you cannot simply use a custom system prompt when using the tokenizers apply_chat_template function.

messages = [{"role": "System", "content": "Answer with 'Pong'"}, {"role": "User", "content": "Ping"}]
chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Error: TemplateError: Roles must alternate User/Assistant/User/Assistant/...

I would suggest this as the default chat template:

{%- for message in messages %}
    {%- set role = message['role']|lower %}
    {%- if role == 'system' %}
        {{- 'System: ' + message['content'] + '\n' }}
    {%- elif role in ['user', 'assistant'] %}
        {{- message['role']|capitalize + ': ' + message['content'] + ('\n' if role == 'user' else eos_token + '\n') }}
    {%- else %}
        {{- raise_exception('Only user, assistant and system roles are supported!') }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- 'Assistant: ' }}
{%- endif %}

There is also a related issue when hosting the model with Huggingface TGI.

docker run \
--rm \
--gpus all \
--shm-size 1g \
-p 8080:80 \
ghcr.io/huggingface/text-generation-inference:2.4.1 \
--model-id openGPT-X/Teuken-7B-instruct-commercial-v0.4 \
--trust-remote-code \
--num-shard 2 \
--max-batch-total-tokens 9502 \
--max-batch-prefill-tokens 9502 \
--max-input-tokens 9500 \
--max-total-tokens 9501 \
--max-client-batch-size 1 \
--max-batch-size 1

from huggingface_hub import InferenceClient
client = InferenceClient(
    base_url="http://HOST:PORT",
)
chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[{"role": "user", "content": "What is deep learning?"}]
)
# Error: Template error: template not found

However this works perfectly.

client.text_generation('System: You are an AI\nUser: What is deep learning?\nAssistant: ', max_new_tokens=1000)
# Output: Deep learning is a subfield of machine learning that uses artificial...

StephanNoller

10 days ago

Hi all, has this been solved in some way now? Can i also ask please - if i would like to use one of the above chat-templates, how would i actually do it? (i am deploying on runpod.io where i can only modify env-variables and how it is called)
My ultimate goal is to use the OpenAI library for running it. Which template would you suggest for this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment