pradeep6kumar2024 commited on
Commit
93ed937
·
1 Parent(s): c6e01dd

added readme and app.py

Browse files
Files changed (1) hide show
  1. README.md +3 -133
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Phi-2 QLoRA Assistant
3
  emoji: 🤖
4
  colorFrom: blue
5
  colorTo: purple
@@ -9,136 +9,6 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- # Phi-2 QLoRA Fine-tuned Assistant (CPU-Optimized)
13
 
14
- This is a lightweight CPU-optimized version of Microsoft's Phi-2 model fine-tuned using QLoRA (Quantized Low-Rank Adaptation) technique. The model has been optimized to run efficiently on CPU environments while still providing helpful responses for coding, explanations, and writing tasks.
15
-
16
- ## Model Description
17
-
18
- - **Base Model**: Microsoft Phi-2
19
- - **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
20
- - **Optimization**: CPU-optimized with reduced parameters
21
- - **Primary Use Cases**: Code generation, technical explanations, and professional writing
22
-
23
- ## Usage Tips
24
-
25
- ### For Code Generation (Temperature: 0.3-0.5)
26
- ```python
27
- # Example prompt:
28
- "Write a Python function to calculate factorial"
29
- ```
30
-
31
- ### For Technical Explanations (Temperature: 0.4-0.5)
32
- ```text
33
- # Example prompt:
34
- "Explain machine learning simply"
35
- ```
36
-
37
- ### For Professional Writing (Temperature: 0.4-0.6)
38
- ```text
39
- # Example prompt:
40
- "Write a short email to schedule a meeting"
41
- ```
42
-
43
- ## Parameters Guide (CPU-Optimized)
44
-
45
- - **Maximum Length**: 64-256 (default: 192)
46
- - Keep this low (128-192) for faster responses on CPU
47
- - Higher values will significantly slow down generation
48
-
49
- - **Temperature**: 0.1-0.7 (default: 0.4)
50
- - 0.3-0.4: Best for code generation
51
- - 0.4-0.5: Best for explanations
52
- - 0.5-0.6: Best for creative writing
53
-
54
- - **Top P**: 0.5-0.9 (default: 0.8)
55
- - Controls diversity of word choices
56
- - Lower values = more focused responses
57
-
58
- ## Performance Notes
59
-
60
- This is a CPU-optimized version with the following considerations:
61
- - Responses will be shorter than the GPU version
62
- - Generation takes longer on CPU (be patient)
63
- - Memory usage is optimized for CPU environments
64
- - Best for shorter, focused prompts
65
-
66
- ## Model Links
67
-
68
- - **Model Card**: [pradeep6kumar2024/phi2-qlora-assistant](https://huggingface.co/pradeep6kumar2024/phi2-qlora-assistant)
69
- - **Base Model**: [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
70
-
71
- ## License
72
-
73
- This demo is released under the MIT License.
74
-
75
- ## Example Usage
76
-
77
- ```python
78
- from transformers import AutoModelForCausalLM, AutoTokenizer
79
- from peft import PeftModel
80
- import torch
81
-
82
- # Load base model and adapter (CPU optimized)
83
- base_model = AutoModelForCausalLM.from_pretrained(
84
- "microsoft/phi-2",
85
- torch_dtype=torch.float32, # Use float32 for CPU
86
- device_map="cpu",
87
- low_cpu_mem_usage=True
88
- )
89
- model = PeftModel.from_pretrained(
90
- base_model,
91
- "pradeep6kumar2024/phi2-qlora-assistant",
92
- torch_dtype=torch.float32,
93
- device_map="cpu"
94
- )
95
- tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
96
-
97
- # Generate text (CPU optimized)
98
- prompt = "Write a Python function to calculate factorial"
99
- inputs = tokenizer(prompt, return_tensors="pt")
100
- outputs = model.generate(
101
- **inputs,
102
- max_length=256,
103
- temperature=0.4,
104
- top_p=0.8,
105
- num_beams=1 # Greedy decoding for CPU
106
- )
107
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
108
- ```
109
-
110
- ## Example Outputs
111
-
112
- 1. **Coding Task**:
113
- ```python
114
- def factorial(n):
115
- if n == 0 or n == 1:
116
- return 1
117
- return n * factorial(n-1)
118
- ```
119
-
120
- 2. **Technical Explanation**:
121
- "Machine learning is a branch of artificial intelligence that enables computers to learn from data without being explicitly programmed. It works by analyzing patterns in data and making predictions based on those patterns."
122
-
123
- 3. **Professional Writing**:
124
- "Subject: Team Meeting Request
125
-
126
- Hi Team,
127
-
128
- I'd like to schedule a meeting next week to discuss our current project. Please let me know your availability.
129
-
130
- Thanks,
131
- [Your Name]"
132
-
133
- ## Limitations
134
-
135
- - CPU version generates shorter responses than GPU version
136
- - Generation is slower on CPU environments
137
- - Works best with clear, concise prompts
138
- - Memory constraints may limit very complex generations
139
-
140
- ## Acknowledgments
141
-
142
- - Microsoft for the Phi-2 base model
143
- - Hugging Face for the transformers library and hosting
144
- - The QLoRA paper authors for the fine-tuning technique
 
1
  ---
2
+ title: Phi2 QLoRA
3
  emoji: 🤖
4
  colorFrom: blue
5
  colorTo: purple
 
9
  pinned: false
10
  ---
11
 
12
+ # Phi-2 QLoRA Assistant
13
 
14
+ A CPU-optimized version of Microsoft's Phi-2 model fine-tuned with QLoRA.