Cant't get it up running

by StephanNoller - opened Dec 17, 2024

Dec 17, 2024

Hi all, wanted to get Teuken up and running under vllm but everything failed so far. I am running a pod on runpod that serves occiglot so far without problems and tried the same settings for Teuken, but it is somehow not working (logs seem ok but still no response). These are my settings to run it: --host 0.0.0.0 --port 8000 --model openGPT-X/Teuken-7B-instruct-commercial-v0.4 --dtype bfloat16 --enforce-eager --gpu-memory-utilization 0.95 --trust-remote-code.
Also tried so set it up here on HF as an Inference Endpoint but does not work either. At HF i am doubting that my --trust-remote-code is going through well, but not sure. Any ideas? Or recommendations where i can get it deployed in a better way? Can it be that i am running into problems because of the chat-template? (using OpenAI api). Thanks in advance,
Stephan

StephanNoller

Dec 17, 2024

ok, it runs now - turns out that everything was actually up and running but indeed i think i am having issues with the chat-template. The roles "user" or "assistant" are not accepted obviously. It only works atm if i pass one message with the role "user". How can i change this?

StephanNoller

Dec 17, 2024

btw i am calling it like this (took example from the readme/model card):

from openai import OpenAI
client = OpenAI(
api_key="EMPTY",
base_url="http://localhost:8000/v1",
)
completion = client.chat.completions.create(
model="openGPT-X/Teuken-7B-instruct-research-v0.4",
messages=[{"role": "User", "content": "Hallo"}],
extra_body={"chat_template":"DE"}
)
print(f"Assistant: {completion}")

StephanNoller changed discussion status to closed Dec 17, 2024

StephanNoller changed discussion status to open Dec 17, 2024

danielsteinigen

OpenGPT-X org Dec 27, 2024

•

edited Dec 27, 2024

Hi @StephanNoller , thanks for your investigations. For this instruction-tuned model, we have used a selection of system messages listed here for training: https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4/blob/main/gptx_tokenizer.py#L432

We suggest also using these system messages for the inference. In case you want to test custom system messages, you could specify a custom chat template with vLLM as described here https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4#usage-with-vllm-server and set the --chat-template parameter to the following Jinja-Template:

{%- if messages[0]["role"] == "system" %}
{{- messages[0]['role']|capitalize + ': ' + messages[0]['content'] + '\\n' }}
{%- set loop_messages = messages[1:] %}
{%- else %}
System: Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.{{- '\\n'}}
{%- set loop_messages = messages %}
{%- endif %}
{%- for message in loop_messages %}
{%- if message['role']|lower == 'user' %}
{{- message['role']|capitalize + ': ' + message['content'] + '\\n' }}
{%- elif message['role']|lower == 'assistant' %}
{{- message['role']|capitalize + ': ' + message['content'] + '</s>' + '\\n' }}
{%- else %}
{{- raise_exception('Only user and assistant roles are supported!') }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- 'Assistant: '}}
{%- endif %}

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment