A newer version of the Streamlit SDK is available:
1.43.2
Prompt Template
The prompt template of XTuner ensures consistency with the LLMs' official templates. Below, we will elaborate on its logic using the example of InternLM-Chat model (internlm_chat
).
Structure
internlm_chat=dict(
SYSTEM='<|System|>:{system}\n',
INSTRUCTION='<|User|>:{input}<eoh>\n<|Bot|>:',
SUFFIX='<eoa>',
SUFFIX_AS_EOS=True,
SEP='\n',
STOP_WORDS=['<eoa>'])
SYSTEM
: The template for the "system" field during Q&A, where{system}
represents the "system" text. It's worth noting that this field only appears once in multi-turn dialogues, specifically in the first turn.INSTRUCTION
: The template for the "instruction" field during Q&A, where{input}
represents the user instruction text.SUFFIX
: The suffix for the "instruction" field, which will be appended to the "response" of each Q&A turn. Typically, this also serves as a special ending symbol (i.e.,eos
). Defaults to''
.SUFFIX_AS_EOS
: Represents whether the aforementioned suffix acts as an ending symbol. If set toTrue
, it will replace theeos_token
of thetokenizer
. Otherwise, theeos_token
of thetokenizer
will still be used to denote the end of sequence. Defaults toFalse
.SEP
: Used to separate multi-turn dialogues, it will be appended after theINSTRUCTION
andSUFFIX
. Defaults to''
.STOP_WORDS
: Used to specify the stop words, this information will be utilized during the text generation stage. It's worth noting that theeos_token
of thetokenizer
is automatically added toSTOP_WORDS
, without the need for manual setting.
Results
Single-turn
<|System|>:{system}
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
Multi-turn
<|System|>:{system}
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
Choosing the prompt template
Model | Prompt Template |
---|---|
baichuan-inc/Baichuan-7B | default* |
baichuan-inc/Baichuan-13B-Base | default* |
baichuan-inc/Baichuan-13B-Chat | baichuan_chat |
baichuan-inc/Baichuan2-7B-Base | default* |
baichuan-inc/Baichuan2-7B-Chat | baichuan2_chat |
baichuan-inc/Baichuan2-13B-Base | default* |
baichuan-inc/Baichuan2-13B-Chat | baichuan2_chat |
THUDM/chatglm2-6b | chatglm2 |
THUDM/chatglm3-6b | chatglm3 |
THUDM/chatglm3-6b-base | chatglm3 |
deepseek-ai/deepseek-coder-6.7b-base | deepseek_coder |
deepseek-ai/deepseek-coder-6.7b-instruct | deepseek_coder |
internlm/internlm-7b | default* |
internlm/internlm-20b | default* |
internlm/internlm-chat-7b | internlm_chat |
internlm/internlm-chat-20b | internlm_chat |
huggyllama/llama-7b | default |
meta-llama/Llama-2-7b-hf | llama2_chat |
meta-llama/Llama-2-7b-chat-hf | llama2_chat |
meta-llama/Llama-2-70b-hf | llama2_chat |
lmsys/vicuna-7b-v1.5 | vicuna |
lmsys/vicuna-13b-v1.5 | vicuna |
mistralai/Mistral-7B-v0.1 | mistral |
mistralai/Mixtral-8x7B-v0.1 | mixtral |
mistralai/Mixtral-8x7B-Instruct-v0.1 | mixtral |
Qwen/Qwen-1_8B | default* |
Qwen/Qwen-1_8B-Chat | qwen_chat |
Qwen/Qwen-7B | default* |
Qwen/Qwen-7B-Chat | qwen_chat |
Qwen/Qwen-72B | default* |
Qwen/Qwen-72B-Chat | qwen_chat |
bigcode/starcoder | default |
01-ai/Yi-6B | default |
01-ai/Yi-34B | default |
HuggingFaceH4/zephyr-7b-beta | zephyr |
deepseek-ai/deepseek-moe-16b-base | deepseek_moe |
deepseek-ai/deepseek-moe-16b-chat | deepseek_moe |
internlm/internlm2-1_8b | default* |
internlm/internlm2-7b | default* |
internlm/internlm2-20b | default* |
internlm/internlm2-chat-1_8b | internlm2_chat |
internlm/internlm2-chat-7b | internlm2_chat |
internlm/internlm2-chat-20b | internlm2_chat |
Qwen/Qwen1.5-0.5B | default* |
Qwen/Qwen1.5-0.5B-Chat | qwen_chat |
Qwen/Qwen1.5-1.8B | default* |
Qwen/Qwen1.5-1.8B-Chat | qwen_chat |
Qwen/Qwen1.5-4B | default* |
Qwen/Qwen1.5-4B-Chat | qwen_chat |
Qwen/Qwen1.5-7B | default* |
Qwen/Qwen1.5-7B-Chat | qwen_chat |
Qwen/Qwen1.5-14B | default* |
Qwen/Qwen1.5-14B-Chat | qwen_chat |
Qwen/Qwen1.5-72B | default* |
Qwen/Qwen1.5-72B-Chat | qwen_chat |
google/gemma-2b | default* |
google/gemma-2b-it | gemma* |
google/gemma-7b | default* |
google/gemma-7b-it | gemma* |
*: The official template has special tokens (like <|im_start|>
, <|im_end|>
) that were not trained during the pre-training phase. Therefore, these models utilize the default
template.