about the "model_max_length": 16384

#11

by AlexWuKing - opened 5 days ago

5 days ago

the original model_max_length of the Qwen/Qwen2.5-7B-Instruct is 131072
but in this distill model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B, it is set to 16384

i wonder why we are doing

this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment