Did someone else encounter this "bug"?

#32
by mdpi-ai - opened

Bug

For a prompt formatted following the guidelines at a specific seed, the model generates an endless list of dashs instead of a sensical reply

prompt: a long prompt I am not willing to disclose publically following the Llama 3 instruction template

that is: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"

model reply: "---------------------------------"...

expected reply: a meaningful text replying to the user question

Configuration

Running this model on HF text generation inference endpoint with the following config:
- MODEL_ID=Meta-Llama-3-70B-Instruct
- NUM_SHARD=2
- MAX_TOTAL_TOKENS=8192
- MAX_INPUT_LENGTH=6144
- HUGGING_FACE_HUB_TOKEN=${HF_TOKEN:-none}
- MAX_BATCH_PREFILL_TOKENS=6144
- CUDA_MEMORY_FRACTION=0.8
- MAX_TOP_N_TOKENS=30
- ENABLE_CUDA_GRAPHS
- QUANTIZE=eetq

CUDA 12.2
GPUS: 2x Nvidia A100 80Gb

Using the Langchain client HuggingFaceEndpoint through the invoke function

with the following generation parameters:
top_k=None
top_p=0.95
temperature=0.6
stop= ["<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>", "<|reserved_special_token" ]
max_new_tokens=1000
return_only_new_tokens=True
frequency_penalty=None
repetition_penalty=None
seed=42

I wonder if I am the only one to experience this, I can share the prompt in private for reproducibility purposes.

mdpi-ai changed discussion status to closed

Sign up or log in to comment