Weird output with instruction following
#4
by
ndurkee
- opened
I'm trying to use this model to follow instructions and give single word responses. Here is an example prompt
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
### Instruction:
Is this a valid sentence?
"Biology1544-91731545-7885Public Library of Science San Francisco,"
Answer yes or no and nothing else.
### Response:
The response I'm getting is
[Answer to the question]
[Answer]
No
Occasionally I'll get
[Answer] No
or
[Question edited for clarity: "Is this a valid sentence? 'Biology 1544-9173 1545-7885 Public Library of Science San Francisco,'"]
Answer: No
Model: bagel-dpo-34b-v0.2
Precision: 16-bit bfloat
Engine: VLLM api server
Settings:
{"prompt": '',
"stream": False,
"max_tokens": 4096,
"temperature": 0.8,
"top_k": -1,
"top_p": 0.9,
"frequency_penalty": 0.2,
"presence_penalty": 0.1,
}
My best guess is that one of your datasets is set up improperly and isn't actually training on the proper data. I've noticed similar behavior with other queries (I can't share due to proprietary information). I did notice that if I prepend the prompt with "Answer" that it works properly.
I would recommend two things, if using alpaca format for this type of prompt:
- remove the system prompt before the instruction
- try using a slightly different prompt that targets true/false (since the model includes boolean questions dataset)
For example:
### Instruction:
True or false - The following is a valid sentence: "Biology1544-91731545-7885Public Library of Science San Francisco,"
### Response:
Example with fastchat cli:
root@428fe25afa00:/workspace# python -m fastchat.serve.cli --model-path ./bagel-dpo-34b-v0.2 --conv-template alpaca --num-gpus 2
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15/15 [00:23<00:00, 1.57s/it]
### Instruction: True or false - The following is a valid sentence: "Biology1544-91731545-7885Public Library of Science San Francisco,"
### Response: false
### Instruction: True or false - The following is a valid sentence: "Hello, how are you today?"
### Response: true