agentlans commited on
Commit
a9a397d
·
verified ·
1 Parent(s): fb240a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -3
README.md CHANGED
@@ -1,3 +1,55 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # Phi-3.5-mini-instruct-o1
5
+
6
+ Phi-3.5-mini-instruct-o1 is a fine-tuned version of the [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) model, optimized for enhanced reasoning capabilities and robustness.
7
+
8
+ ## Model Overview
9
+
10
+ Phi-3.5-mini-instruct-o1 is built upon the Phi-3.5-mini model, which is a lightweight, state-of-the-art open model with 3.8B parameters. The base model supports a 128K token context length and has undergone rigorous enhancement processes to ensure precise instruction adherence and robust safety measures.
11
+
12
+ ## Features
13
+
14
+ - **Enhanced Reasoning Process:** The model excels at providing clear and traceable reasoning paths, making it easier to follow its thought process and identify any potential mistakes.
15
+ - **Improved Multistep Reasoning:** Fine-tuned with O1 data, the model demonstrates enhanced capabilities in multistep reasoning and overall accuracy.
16
+ - **Specialized Capabilities:** Particularly well-suited for tasks involving math, coding, and logic, aligning with the strengths of the Phi-3.5 model family.
17
+ - **Robust Performance:** Fine-tuned with high dropout rates to increase resilience and generalization capabilities.
18
+
19
+ ## Limitations
20
+
21
+ - **Verbose Outputs:** As a chain-of-thought model, responses may be longer and more detailed than necessary for some applications.
22
+ - **Potential Context Length Reduction:** The fine-tuning process may have affected the full 128K token context length supported by the base model.
23
+ - **Quantization Challenges:** Standard llama.cpp quantizations, including 8-bit versions, are not compatible with this model.
24
+
25
+ ## Training Details
26
+
27
+ The fine-tuning process for Phi-3.5-mini-instruct-o1 employed the following techniques and parameters:
28
+
29
+ - **Method:** Low-Rank Adaptation (LoRA) with 4-bit quantization via BitsAndBytes
30
+ - **Dataset:** OpenO1-SFT
31
+ - **Batch Size:** 1 with 8 gradient accumulation steps
32
+ - **Learning Rate:** 5e-5
33
+ - **Training Duration:** Single epoch, limited to 10,000 samples
34
+ - **LoRA Configuration:** Rank 32, alpha 64, dropout 0.9
35
+ - **Advanced Techniques:** Shift attention, DoRA, RS-LoRA
36
+ - **Compute Type:** BF16
37
+ - **Context Length:** 2048 tokens
38
+ - **Optimizer:** AdamW with cosine learning rate scheduling
39
+ - **Model Freezing:** Two trainable layers frozen
40
+ - **Additional Enhancement:** NEFTune with alpha 5
41
+
42
+ This fine-tuning approach was designed to efficiently adapt the model while maintaining its generalization capabilities and computational efficiency.
43
+
44
+ ## Intended Use
45
+
46
+ Phi-3.5-mini-instruct-o1 is suitable for commercial and research applications that require:
47
+
48
+ - Detailed reasoning and problem-solving in math, coding, and logic tasks
49
+ - Transparent thought processes for analysis and debugging
50
+ - Robust performance in various scenarios
51
+ - Efficient operation in memory/compute constrained environments
52
+
53
+ ## Ethical Considerations
54
+
55
+ Users should be aware of potential biases in the model's outputs and exercise caution when deploying it in sensitive applications. Always verify the model's results, especially for critical decision-making processes.