Here's the updated README.md with the requested changes:

Phi-4 o1 [ Responsible Mathematical Problem Solving & Reasoning Capabilities ]

Phi-4 o1 [ Responsible Mathematical Problem Solving & Reasoning Capabilities ] is a state-of-the-art open model fine-tuned on advanced reasoning tasks. It is based on Microsoft’s Phi-4, built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The primary focus is to create a small, capable model that excels in responsible reasoning and mathematical problem-solving with high-quality data.

The Phi-4 o1 model has undergone robust safety post-training using a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization) techniques. The safety alignment process includes publicly available datasets and proprietary synthetic datasets to improve helpfulness, harmlessness, and responsible AI usage.

Dataset Info

Phi-4 o1 ft is fine-tuned on a synthetic dataset curated through a specially designed pipeline. The dataset leverages the Math IO (Input-Output) methodology and step-by-step problem-solving approaches. This ensures the model is highly effective in:

Responsible mathematical problem-solving
Logical reasoning
Stepwise breakdowns of complex tasks

The dataset design focuses on enabling the model to generate detailed, accurate, and logically coherent solutions for mathematical and reasoning-based tasks.

Run with Transformers

To use Phi-4 o1 ft for text generation tasks, follow the example below:

Example Usage

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Phi-4-Math-IO")
model = AutoModelForCausalLM.from_pretrained(
    "prithivMLmods/Phi-4-Math-IO",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Input prompt
input_text = "Solve the equation: 2x + 3 = 11. Provide a stepwise solution."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

# Generate output
outputs = model.generate(**input_ids, max_new_tokens=64)
print(tokenizer.decode(outputs[0]))

For structured dialogue generation, you can apply the chat template as follows:

# Structured input for chat-style interaction
messages = [
    {"role": "user", "content": "Explain Pythagoras’ theorem with an example."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")

# Generate response
outputs = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Intended Use

Phi-4 o1 ft is designed for a wide range of reasoning-intensive and math-focused applications. Below are some key use cases:

1. Responsible Mathematical Problem Solving

Solving complex mathematical problems with detailed, step-by-step solutions.
Assisting students, educators, and researchers in understanding advanced mathematical concepts.

2. Reasoning and Logical Problem Solving

Breaking down intricate problems in logic, science, and other fields into manageable steps.
Providing responsible and accurate reasoning capabilities for critical applications.

3. Educational Tools

Supporting educational platforms with explanations, tutoring, and Q&A support.
Generating practice problems and solutions for students.

4. Content Creation

Assisting content creators in generating accurate and logical educational content.
Helping with technical documentation by providing precise explanations.

5. Customer Support

Automating responses to technical queries with logical stepwise solutions.
Providing accurate, responsible, and coherent information for complex questions.

Limitations

While Phi-4 o1 ft is highly capable in reasoning and mathematics, users should be aware of its limitations:

1. Bias and Fairness

Despite rigorous training, the model may still exhibit biases from its training data. Users are encouraged to carefully review outputs, especially for sensitive topics.

2. Contextual Understanding

The model may sometimes misinterpret ambiguous or complex prompts, leading to incorrect or incomplete responses.

3. Real-Time Knowledge

The model’s knowledge is static, reflecting only the data it was trained on. It does not have real-time information about current events or post-training updates.

4. Safety and Harmlessness

Although safety-aligned, the model may occasionally generate responses that require human oversight. Regular monitoring is recommended when deploying it in sensitive domains.

5. Resource Requirements

Due to its size, running the model efficiently may require high-end computational resources, particularly for large-scale or real-time applications.

6. Ethical Considerations

The model must not be used for malicious purposes, such as generating harmful content, misinformation, or spam. Users are responsible for ensuring ethical use.

7. Domain-Specific Limitations

Although effective in general-purpose reasoning and math tasks, the model may require further fine-tuning for highly specialized domains such as medicine, law, or finance.

prithivMLmods
/

Phi-4-Math-IO