werty1248
/

Qwen2.5-32B-s1.1-Ko-Native

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Korean S1K-1.1

Generation Result: werty1248/qwen-32b-s1.1-Ko-Native-result
- AIME2024(Korean): 30% (9/30) with max_tokens=8192
Chinese appears in think tokens (Not halucination or failure behavior. Solved correctly in Chinese, then answered in Korean)

Training Details

Train with s1k official code
- block_size = 20000
- gradient_checkpointing=True
Training Dataset: werty1248/s1k-1.1-Ko-ReGenerated-Formatted
- Translated questions from s1k-1.1 into Korean, and then regenearted using Deepseek-R1.
- Need to modify "text" column to Qwen format
8x H200 SXM, 2 hours; $63.84, except for the cost of trial and error :(

Evaluation

HRM8K - Korean Math Benchmark

Generated and evaluated with my own code; accuracy may differ.

Model	GSM8K	KSM	MATH	OMNI_MATH
Qwen2.5-32B-s1.1-Ko-Native	89.92	39.85	87.73	42.06
*GPT-4o	91.21	22.83	74.45	30.75
*GPT-4o-mini	87.57	19.40	70.68	26.45
EXAONE-3.5-7.8B-Stratos-Ko	83.02	15.97	67.49	24.62
Qwen2.5-7B-s1.1-Ko-Native	76.27	15.48	66.45	23.57
EXAONE-3.5-7.8B-Instruct	81.58	14.71	63.50	21.69
*Qwen2.5-14B-Instruct	66.34	15.55	53.38	20.64
*Llama-3.1-8B-Instruct	77.79	7.21	49.01	15.92
*Qwen2.5-7B-Instruct	58.38	13.10	48.04	16.55
*EXAONE-3.0-7.8B-Instruct	72.33	7.98	46.79	15.35
*Ko-R1-1.5B-preview	43.3	?	73.1	29.8

* Reported by HRM8K authors

Generation
- temperature = 0.7
- top_p = 0.95
- max_tokens = 8192
- If it exceeded the maximum tokens but did not generate </think> tokens, add </think> tokens and generate 512 additional tokens
Evaluation
- Custom parser & evaluation code; may have parsed incorrectly

Why Qwen? Why EXAONE can't?

Downloads last month: 46

Safetensors

Model size

32.8B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for werty1248/Qwen2.5-32B-s1.1-Ko-Native

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-32B-Instruct

Finetuned

(142)

this model

Quantizations

1 model

Dataset used to train werty1248/Qwen2.5-32B-s1.1-Ko-Native