Meta-Llama-3.1-405B-Instruct-FP8 seems to be misconfigured

#534
by rymiel - opened

With a sufficiently long conversation using Meta-Llama-3.1-405B-Instruct-FP8, this error occurs 100% of the time, making the model unusable after a certain point:

Input validation error: inputs tokens + max_new_tokens must be <= 16384. Given: 14337 inputs tokens and 2048 max_new_tokens

For Meta-Llama-3.1-405B-Instruct-FP8, truncate is set to 14337 and max_new_tokens is set to 2048. Added together, these are 16385 tokens, which is 2^14+1. Seems like an off-by-one, it will always fail the check <= 16384

In comparison, for Meta-Llama-3.1-70B-Instruct, truncate is set to 7167 and max_new_tokens is set to 1024. Added together, 8191, which is 2^13-1. Seems like another off-by-one, but in the other direction.

I'm not sure where max_total_tokens is being set, though.

Similar to #430

Hugging Chat org

Thanks for bringing this up, will take a look

Hugging Chat org

Should be fixed! Let me know if you're still having issues

nsarrazin changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment