agentlans
/

Phi-3.5-mini-instruct-o1

Model card Files Files and versions Community

Phi-3.5-mini-instruct-o1 / README.md

agentlans's picture

Update README.md

455ca37 verified 22 days ago

|

history blame contribute delete

3.2 kB

	---
	license: mit
	datasets:
	- O1-OPEN/OpenO1-SFT
	language:
	- en
	base_model:
	- microsoft/Phi-3.5-mini-instruct
	---
	# Phi-3.5-mini-instruct-o1

	Phi-3.5-mini-instruct-o1 is a fine-tuned version of the [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) model, optimized for enhanced reasoning capabilities and robustness.

	## Model Overview

	Phi-3.5-mini-instruct-o1 is built upon the Phi-3.5-mini model, which is a lightweight, state-of-the-art open model with 3.8B parameters. The base model supports a 128K token context length and has undergone rigorous enhancement processes to ensure precise instruction adherence and robust safety measures.

	## Features

	- Enhanced Reasoning Process: The model excels at providing clear and traceable reasoning paths, making it easier to follow its thought process and identify any potential mistakes.
	- Improved Multistep Reasoning: Fine-tuned with O1 data, the model should have enhanced capabilities in multistep reasoning and overall accuracy.
	- Specialized Capabilities: Particularly well-suited for tasks involving math, coding, and logic, aligning with the strengths of the Phi-3.5 model family.
	- Robust Performance: Fine-tuned with high dropout rates to increase resilience and generalization capabilities.

	## Limitations

	- Verbose Outputs: As a chain-of-thought model, responses may be longer and more detailed than necessary for some applications.
	- Potential Context Length Reduction: The fine-tuning process may have affected the full 128K token context length supported by the base model.
	- Quantization Challenges: Standard llama.cpp quantizations, including 8-bit versions, are not compatible with this model.

	## Training Details

	The fine-tuning process for Phi-3.5-mini-instruct-o1 employed the following techniques and parameters:

	- Method: Low-Rank Adaptation (LoRA) with 4-bit quantization via BitsAndBytes
	- Dataset: [O1-OPEN/OpenO1-SFT](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
	- Batch Size: 1 with 8 gradient accumulation steps
	- Learning Rate: 5e-5
	- Training Duration: Single epoch, limited to 10,000 samples
	- LoRA Configuration: Rank 32, alpha 64, dropout 0.9
	- Advanced Techniques: Shift attention, DoRA, RS-LoRA
	- Compute Type: BF16
	- Context Length: 2048 tokens
	- Optimizer: AdamW with cosine learning rate scheduling
	- Additional Enhancement: NEFTune with alpha 5

	This fine-tuning approach was designed to efficiently adapt the model while maintaining its generalization capabilities and computational efficiency.

	## Intended Use

	Phi-3.5-mini-instruct-o1 is suitable for commercial and research applications that require:

	- Detailed reasoning and problem-solving in math, coding, and logic tasks
	- Transparent thought processes for analysis and debugging
	- Robust performance in various scenarios
	- Efficient operation in memory/compute constrained environments

	## Ethical Considerations

	Users should be aware of potential biases in the model's outputs and exercise caution when deploying it in sensitive applications. Always verify the model's results, especially for critical decision-making processes.