This is my experiment with training a reasoning model using TRL's GRPO and Unsloth API.

Inference:

Using Unsloth API (For Faster Inference):

import torch
from unsloth import FastLanguageModel
from transformers import TextStreamer

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "ubermenchh/Qwen2.5-3B-openr1-math",
    max_seq_length = 1024,
    dtype = torch.bfloat16,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model) 

SYSTEM_PROMPT = """
Respond in the following format:
<think>
...
</think>
<answer>
...
</answer>
"""

test_question = """
Let $z \in \mathbf{C}$, satisfying the condition $a z^{n}+b \mathrm{i} z^{n-1}+b \mathrm{i} z-a=0, a, b \in \mathbf{R}, m \in$ $\mathbf{N}$, find $|z|$.
"""

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": test_question},
]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 2048, pad_token_id = tokenizer.eos_token_id)

Using Transformers API:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
  "ubermenchh/Qwen2.5-3B-openr1-math",
  torch_dtype=torch.bfloat16,
  device_map="auto",
  trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
  "ubermenchh/Qwen2.5-3B-openr1-math",
  trust_remote_code=True
)

SYSTEM_PROMPT = """
Respond in the following format:
<think>
...
</think>
<answer>
...
</answer>
"""

problem = "Let $z \in \mathbf{C}$, satisfying the condition $a z^{n}+b \mathrm{i} z^{n-1}+b \mathrm{i} z-a=0, a, b \in \mathbf{R}, m \in$ $\mathbf{N}$, find $|z|$."
prompt = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": problem}
]

input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=3000, 
    temperature=1.3, 
    num_return_sequences=1, 
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Question:\n", problem)
print("\n\nResponse:\n", response)

References:

Uploaded model

  • Developed by: ubermenchh
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
20
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train ubermenchh/Qwen2.5-3B-openr1-math