Did someone else encounter this "bug"?
Bug
For a prompt formatted following the guidelines at a specific seed, the model generates an endless list of dashs instead of a sensical reply
prompt: a long prompt I am not willing to disclose publically following the Llama 3 instruction template
that is: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
model reply: "---------------------------------"...
expected reply: a meaningful text replying to the user question
Configuration
Running this model on HF text generation inference endpoint with the following config:
- MODEL_ID=Meta-Llama-3-70B-Instruct
- NUM_SHARD=2
- MAX_TOTAL_TOKENS=8192
- MAX_INPUT_LENGTH=6144
- HUGGING_FACE_HUB_TOKEN=${HF_TOKEN:-none}
- MAX_BATCH_PREFILL_TOKENS=6144
- CUDA_MEMORY_FRACTION=0.8
- MAX_TOP_N_TOKENS=30
- ENABLE_CUDA_GRAPHS
- QUANTIZE=eetq
CUDA 12.2
GPUS: 2x Nvidia A100 80Gb
Using the Langchain client HuggingFaceEndpoint through the invoke function
with the following generation parameters:
top_k=None
top_p=0.95
temperature=0.6
stop= ["<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>", "<|reserved_special_token" ]
max_new_tokens=1000
return_only_new_tokens=True
frequency_penalty=None
repetition_penalty=None
seed=42
I wonder if I am the only one to experience this, I can share the prompt in private for reproducibility purposes.