File size: 4,302 Bytes
7f40c0b 6b2b4c9 7f40c0b eb75003 7f40c0b 4c1d3ac 7f40c0b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
---
datasets:
- nvidia/OpenCodeReasoning-2
- GetSoloTech/Code-Reasoning
base_model:
- openai/gpt-oss-20b
library_name: transformers
tags:
- code-reasoning
- vllm
pipeline_tag: text-generation
---
<img src="gpt-oss-reasoning.png" width="700"/>
### Overview
- Base model: `openai/gpt-oss-20b`
- Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
- Dataset: `nvidia/OpenCodeReasoning-2` (OCR-2), combining `python` and `cpp` splits. Each sample reconstructs the upstream question and uses the dataset's `r1_generation` as the assistant response
- Context length: 4096 tokens
- Training method: LoRA SFT via TRL `SFTTrainer`
### Intended Use
- Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
- Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code
### Prompt Format
This model was trained in a chat format. Recommended structure:
```python
messages = [
{"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
{"role": "user", "content": problem_text},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
```
If you prefer plain text, place the problem text after a brief instruction, but chat format generally yields better results.
### Reasoning Effort
Specify reasoning effort in `apply_chat_template` (supported values: "low", "medium" (default), or "high"):
```python
messages = [
{"role": "system", "content": "Always respond in riddles"},
{"role": "user", "content": "Explain why the meaning of life is 42"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
reasoning_effort="high",
).to(model.device)
generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
```
### Quick Start (Transformers)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "GetSoloTech/gpt-oss-code-reasoning-20b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=auto,
device_map="auto",
)
problem_text = """
You are given an array of integers ... (your problem here)
"""
messages = [
{"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
{"role": "user", "content": problem_text},
]
input_text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
reasoning_effort="medium",
)
inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=768,
temperature=0.3,
top_p=0.9,
repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Generation Tips
- Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
- Length: Use `max_new_tokens` 512–1024 for full solutions; shorter for hints
- Stop tokens: If you only want final code, consider post-processing the model output to extract the last code block
### Dataset Construction Notes
- Source: `nvidia/OpenCodeReasoning-2` with `python` and `cpp` splits
- For each split, the script:
- Shuffles and selects up to `--take_samples` examples per split
- Reconstructs the problem statement from upstream benchmarks (TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`)
- Filters out rows with missing/empty questions or assistant responses
- Builds chat-style `messages` and a formatted `text` field with the tokenizer's chat template
- The final training set is the concatenation of both splits, followed by an optional `train_test_split` according to `--eval_ratio`
### Acknowledgements
- Unsloth (`FastLanguageModel`) for efficient 4-bit loading and fast PEFT
- TRL (`SFTTrainer`) for straightforward supervised fine-tuning
- NVIDIA OpenCodeReasoning-2 and upstream benchmarks (TACO, APPS, CodeContests, `open-r1/codeforces`)
--- |