the original model_max_length of the Qwen/Qwen2.5-7B-Instruct is 131072 but in this distill model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B, it is set to 16384
i wonder why we are doing this?
Your need to confirm your account before you can post a new comment.