File size: 4,302 Bytes
7f40c0b
 
 
6b2b4c9
7f40c0b
 
 
 
eb75003
 
 
7f40c0b
 
4c1d3ac
 
7f40c0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
datasets:
- nvidia/OpenCodeReasoning-2
- GetSoloTech/Code-Reasoning
base_model:
- openai/gpt-oss-20b
library_name: transformers
tags:
- code-reasoning
- vllm
pipeline_tag: text-generation
---

<img src="gpt-oss-reasoning.png" width="700"/>

### Overview

- Base model: `openai/gpt-oss-20b`
- Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
- Dataset: `nvidia/OpenCodeReasoning-2` (OCR-2), combining `python` and `cpp` splits. Each sample reconstructs the upstream question and uses the dataset's `r1_generation` as the assistant response
- Context length: 4096 tokens
- Training method: LoRA SFT via TRL `SFTTrainer`

### Intended Use

- Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
- Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code

### Prompt Format

This model was trained in a chat format. Recommended structure:

```python
messages = [
    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
    {"role": "user", "content": problem_text},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
```

If you prefer plain text, place the problem text after a brief instruction, but chat format generally yields better results.

### Reasoning Effort

Specify reasoning effort in `apply_chat_template` (supported values: "low", "medium" (default), or "high"):

```python
messages = [
    {"role": "system", "content": "Always respond in riddles"},
    {"role": "user", "content": "Explain why the meaning of life is 42"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    reasoning_effort="high",
).to(model.device)

generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
```

### Quick Start (Transformers)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "GetSoloTech/gpt-oss-code-reasoning-20b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=auto,
    device_map="auto",
)

problem_text = """
You are given an array of integers ... (your problem here)
"""

messages = [
    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
    {"role": "user", "content": problem_text},
]

input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    reasoning_effort="medium",
)

inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=768,
    temperature=0.3,
    top_p=0.9,
    repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Generation Tips

- Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
- Length: Use `max_new_tokens` 512–1024 for full solutions; shorter for hints
- Stop tokens: If you only want final code, consider post-processing the model output to extract the last code block


### Dataset Construction Notes

- Source: `nvidia/OpenCodeReasoning-2` with `python` and `cpp` splits
- For each split, the script:
  - Shuffles and selects up to `--take_samples` examples per split
  - Reconstructs the problem statement from upstream benchmarks (TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`)
  - Filters out rows with missing/empty questions or assistant responses
  - Builds chat-style `messages` and a formatted `text` field with the tokenizer's chat template
- The final training set is the concatenation of both splits, followed by an optional `train_test_split` according to `--eval_ratio`


### Acknowledgements

- Unsloth (`FastLanguageModel`) for efficient 4-bit loading and fast PEFT
- TRL (`SFTTrainer`) for straightforward supervised fine-tuning
- NVIDIA OpenCodeReasoning-2 and upstream benchmarks (TACO, APPS, CodeContests, `open-r1/codeforces`)

---