meta-llama/Meta-Llama-3-70B-Instruct · Did someone else encounter this "bug"?

Bug

For a prompt formatted following the guidelines at a specific seed, the model generates an endless list of dashs instead of a sensical reply

prompt: a long prompt I am not willing to disclose publically following the Llama 3 instruction template

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"

model reply: "---------------------------------"...

expected reply: a meaningful text replying to the user question

Configuration

Running this model on HF text generation inference endpoint with the following config:
- MODEL_ID=Meta-Llama-3-70B-Instruct
- NUM_SHARD=2
- MAX_TOTAL_TOKENS=8192
- MAX_INPUT_LENGTH=6144
- HUGGING_FACE_HUB_TOKEN=${HF_TOKEN:-none}
- MAX_BATCH_PREFILL_TOKENS=6144
- CUDA_MEMORY_FRACTION=0.8
- MAX_TOP_N_TOKENS=30
- ENABLE_CUDA_GRAPHS
- QUANTIZE=eetq

CUDA 12.2
GPUS: 2x Nvidia A100 80Gb

Using the Langchain client HuggingFaceEndpoint through the invoke function

I wonder if I am the only one to experience this, I can share the prompt in private for reproducibility purposes.