Safetensors
qwen2

Model Issue

#1
by YOYO-AI - opened

The current merged version struggles with long-chain reasoning and tends to provide immediate answers directly. Would it be possible to explore re-merging the model to address this limitation?

FuseAI org

The current merged version struggles with long-chain reasoning and tends to provide immediate answers directly. Would it be possible to explore re-merging the model to address this limitation?

We find this problem and try to fix it. This might be due to the significantly different parameter space between Qwen2.5-Coder-32B and DeepSeek-R1-32B.

I see a new version has been uploaded @Wanfq . Any comments on the changes in this new release? Does this fix the issue that was discussed here?

FuseAI org

I see a new version has been uploaded @Wanfq . Any comments on the changes in this new release? Does this fix the issue that was discussed here?

Yes, we change the base pretrain model from Qwen2.5-32B to Qwen2.5-32B-Coder. This indeed fix this issue. Results are shown below:

Models LiveCodeBench LiveCodeBench-Easy LiveCodeBench-Medium LiveCodeBench-Hard
OpenAI o1 63.4 98.5 80.9 31.7
OpenAI o1-preview 42.7 97.0 47.2 9.8
OpenAI o1-mini 52.00 91.0 67.4 19.5
DeepSeek R1 62.8 98.4 78.3 32.2
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 56.1 93.6 73.1 23.4
Qwen/QwQ-32B-Preview 44.4 94.9 53.8 10.0
NovaSky-AI/Sky-T1-32B-Preview 37.3 89.7 40.4 6.6
FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview 56.4 92.9 73.5 24.2
FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview 54.8 93.9 71.7 21.3
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview 58.2 94.3 77.1 25.0
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview 57.9 93.6 76.0 25.5

Sign up or log in to comment