gbueno86's picture
Update README.md
729ab0b verified
metadata
license: apache-2.0
datasets:
  - gghfez/QwQ-LongCoT-130K-cleaned
language:
  - en
  - zh
base_model:
  - Qwen/Qwen2.5-Coder-1.5B-Instruct

This is Qwen/Qwen2.5-Coder-1.5B-Instruct finetuned on 100 system_prompt/question/answer samples from Qwen/QwQ-32B-Preview from gghfez/QwQ-LongCoT-130K-cleaned.

It is intended to be used as a draft model for QwQ.

Please use the following prompt format for the best results:

"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant. You should think step-by-step.<|im_end|>\n<|im_start|>user\nA unit has 200 employees. Now, 40 employees need to be selected as a sample using the systematic sampling method. All employees are randomly numbered from 1 to 200 and evenly divided into 40 groups according to their numbers in order (1-5, 6-10, ..., 196-200). If the number drawn from the 5th group is 23, then the number drawn from the 10th group should be.<|im_end|>\n<|im_start|>assistant\n"

I recomend you do not change the system prompt.

Testing as a draft model for QwQ using "EXTREME Logic Test: The Mysteries of the Seven Artifacts" from @code4AI, Discover AI on YouTube https://www.youtube.com/watch?v=EWTWS1nghY0:

Model Name Speculative Decoding Predicted Tokens Cached Tokens Time per Token (ms) Tokens per Second Solutions Found Time to Solution (seconds)
Baseline No 9120 9713 73 13.63 1 669.13
qwen2.5-coder-7b-instruct-q4_0.gguf Yes 15349 15942 78 12.89 2 1190.46
qwen2.5-coder-7b-instruct-q2_k.gguf Yes 9344 9937 80 12.44 1 750.96
qwen2.5-coder-1.5b-instruct-q4_0.gguf Yes 8882 9475 69 14.49 2 613.01
qwen2.5-coder-0.5b-instruct-q4_0.gguf Yes 9730 10323 71 14.04 2 692.95
qwen-2.5-1.5b-finetuned-qwq-25.13.01.Q4_0.gguf Yes 5553 6146 65 15.36 1 361.52

System for the testing was an M1 Macbook pro 64GB. Llama.cpp version b4481.

Settings: Predictions -1 Temperature 0.7 Penalize repeat sequence 1 Consider N tokens for penalize 256 Top-K sampling 40 Top-P sampling 0.95 Min-P sampling 0.05