|
--- |
|
license: mit |
|
datasets: |
|
- O1-OPEN/OpenO1-SFT |
|
language: |
|
- en |
|
base_model: |
|
- microsoft/Phi-3.5-mini-instruct |
|
--- |
|
# Phi-3.5-mini-instruct-o1 |
|
|
|
Phi-3.5-mini-instruct-o1 is a fine-tuned version of the [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) model, optimized for enhanced reasoning capabilities and robustness. |
|
|
|
## Model Overview |
|
|
|
Phi-3.5-mini-instruct-o1 is built upon the Phi-3.5-mini model, which is a lightweight, state-of-the-art open model with 3.8B parameters. The base model supports a 128K token context length and has undergone rigorous enhancement processes to ensure precise instruction adherence and robust safety measures. |
|
|
|
## Features |
|
|
|
- **Enhanced Reasoning Process:** The model excels at providing clear and traceable reasoning paths, making it easier to follow its thought process and identify any potential mistakes. |
|
- **Improved Multistep Reasoning:** Fine-tuned with O1 data, the model should have enhanced capabilities in multistep reasoning and overall accuracy. |
|
- **Specialized Capabilities:** Particularly well-suited for tasks involving math, coding, and logic, aligning with the strengths of the Phi-3.5 model family. |
|
- **Robust Performance:** Fine-tuned with high dropout rates to increase resilience and generalization capabilities. |
|
|
|
## Limitations |
|
|
|
- **Verbose Outputs:** As a chain-of-thought model, responses may be longer and more detailed than necessary for some applications. |
|
- **Potential Context Length Reduction:** The fine-tuning process may have affected the full 128K token context length supported by the base model. |
|
- **Quantization Challenges:** Standard llama.cpp quantizations, including 8-bit versions, are not compatible with this model. |
|
|
|
## Training Details |
|
|
|
The fine-tuning process for Phi-3.5-mini-instruct-o1 employed the following techniques and parameters: |
|
|
|
- **Method:** Low-Rank Adaptation (LoRA) with 4-bit quantization via BitsAndBytes |
|
- **Dataset:** [O1-OPEN/OpenO1-SFT](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT) |
|
- **Batch Size:** 1 with 8 gradient accumulation steps |
|
- **Learning Rate:** 5e-5 |
|
- **Training Duration:** Single epoch, limited to 10,000 samples |
|
- **LoRA Configuration:** Rank 32, alpha 64, dropout 0.9 |
|
- **Advanced Techniques:** Shift attention, DoRA, RS-LoRA |
|
- **Compute Type:** BF16 |
|
- **Context Length:** 2048 tokens |
|
- **Optimizer:** AdamW with cosine learning rate scheduling |
|
- **Additional Enhancement:** NEFTune with alpha 5 |
|
|
|
This fine-tuning approach was designed to efficiently adapt the model while maintaining its generalization capabilities and computational efficiency. |
|
|
|
## Intended Use |
|
|
|
Phi-3.5-mini-instruct-o1 is suitable for commercial and research applications that require: |
|
|
|
- Detailed reasoning and problem-solving in math, coding, and logic tasks |
|
- Transparent thought processes for analysis and debugging |
|
- Robust performance in various scenarios |
|
- Efficient operation in memory/compute constrained environments |
|
|
|
## Ethical Considerations |
|
|
|
Users should be aware of potential biases in the model's outputs and exercise caution when deploying it in sensitive applications. Always verify the model's results, especially for critical decision-making processes. |