pradeep6kumar2024 commited on
Commit
3cbb88b
·
1 Parent(s): a27324e

added readme and app.py

Browse files
Files changed (2) hide show
  1. README.md +63 -36
  2. app.py +3 -3
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Phi-2 QLoRA Assistant Demo
3
  emoji: 🤖
4
  colorFrom: blue
5
  colorTo: purple
@@ -9,51 +9,59 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- # Phi-2 QLoRA Fine-tuned Assistant
13
 
14
- This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA (Quantized Low-Rank Adaptation) technique. The model has been trained to excel at various tasks including coding, technical explanations, and professional writing.
15
 
16
  ## Model Description
17
 
18
  - **Base Model**: Microsoft Phi-2
19
  - **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
20
- - **Training Data**: Custom dataset focused on coding, technical explanations, and professional communication
21
- - **Primary Use Cases**: Code generation, technical writing, and professional communication
22
 
23
  ## Usage Tips
24
 
25
  ### For Code Generation (Temperature: 0.3-0.5)
26
  ```python
27
  # Example prompt:
28
- "Write a Python function to calculate the factorial of a number and provide additional recursive function examples"
29
  ```
30
 
31
- ### For Technical Explanations (Temperature: 0.7)
32
  ```text
33
  # Example prompt:
34
- "Explain what machine learning is in simple terms and provide some real-world applications"
35
  ```
36
 
37
- ### For Professional Writing (Temperature: 0.7-0.9)
38
  ```text
39
  # Example prompt:
40
- "Write a professional email to schedule a team meeting for next week to discuss project progress"
41
  ```
42
 
43
- ## Parameters Guide
44
 
45
- - **Maximum Length**: 64-1024 (default: 512)
46
- - Increase for longer responses
47
- - Decrease for quicker, more concise responses
48
 
49
- - **Temperature**: 0.1-1.0 (default: 0.7)
50
- - 0.3-0.5: Best for code generation
51
- - 0.7-0.9: Best for creative writing
52
- - 1.0: Maximum creativity
53
 
54
- - **Top P**: 0.1-1.0 (default: 0.9)
55
  - Controls diversity of word choices
56
- - Higher values = more diverse vocabulary
 
 
 
 
 
 
 
 
57
 
58
  ## Model Links
59
 
@@ -70,15 +78,31 @@ This demo is released under the MIT License.
70
  from transformers import AutoModelForCausalLM, AutoTokenizer
71
  from peft import PeftModel
72
 
73
- # Load base model and adapter
74
- base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
75
- model = PeftModel.from_pretrained(base_model, "pradeep6kumar2024/phi2-qlora-assistant")
 
 
 
 
 
 
 
 
 
 
76
  tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
77
 
78
- # Generate text
79
- prompt = "Write a Python function to calculate the factorial of a number"
80
  inputs = tokenizer(prompt, return_tensors="pt")
81
- outputs = model.generate(**inputs, max_length=512)
 
 
 
 
 
 
82
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
83
  ```
84
 
@@ -93,21 +117,24 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
93
  ```
94
 
95
  2. **Technical Explanation**:
96
- "Machine learning is a branch of artificial intelligence that enables computers to learn from data without being explicitly programmed. Think of it like teaching a child - instead of giving them strict rules, you show them examples and they learn to recognize patterns..."
97
 
98
  3. **Professional Writing**:
99
- "Dear Team,
100
- I hope this email finds you well. I would like to schedule a team meeting next week to discuss our project progress..."
 
 
 
 
 
 
101
 
102
  ## Limitations
103
 
104
- - The model works best with clear, well-structured prompts
105
- - Code generation is optimized for Python but can handle other languages
106
- - Response quality may vary with very long or complex prompts
107
-
108
- ## Try It Out
109
-
110
- You can try this model directly in your browser using our Gradio Space: [Phi2-QLoRA-Assistant Demo](https://huggingface.co/spaces/pradeep6kumar2024/phi2-qlora-assistant-demo)
111
 
112
  ## Acknowledgments
113
 
 
1
  ---
2
+ title: Phi-2 QLoRA Assistant Demo (CPU-Optimized)
3
  emoji: 🤖
4
  colorFrom: blue
5
  colorTo: purple
 
9
  pinned: false
10
  ---
11
 
12
+ # Phi-2 QLoRA Fine-tuned Assistant (CPU-Optimized)
13
 
14
+ This is a lightweight CPU-optimized version of Microsoft's Phi-2 model fine-tuned using QLoRA (Quantized Low-Rank Adaptation) technique. The model has been optimized to run efficiently on CPU environments while still providing helpful responses for coding, explanations, and writing tasks.
15
 
16
  ## Model Description
17
 
18
  - **Base Model**: Microsoft Phi-2
19
  - **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
20
+ - **Optimization**: CPU-optimized with reduced parameters
21
+ - **Primary Use Cases**: Code generation, technical explanations, and professional writing
22
 
23
  ## Usage Tips
24
 
25
  ### For Code Generation (Temperature: 0.3-0.5)
26
  ```python
27
  # Example prompt:
28
+ "Write a Python function to calculate factorial"
29
  ```
30
 
31
+ ### For Technical Explanations (Temperature: 0.4-0.5)
32
  ```text
33
  # Example prompt:
34
+ "Explain machine learning simply"
35
  ```
36
 
37
+ ### For Professional Writing (Temperature: 0.4-0.6)
38
  ```text
39
  # Example prompt:
40
+ "Write a short email to schedule a meeting"
41
  ```
42
 
43
+ ## Parameters Guide (CPU-Optimized)
44
 
45
+ - **Maximum Length**: 64-256 (default: 192)
46
+ - Keep this low (128-192) for faster responses on CPU
47
+ - Higher values will significantly slow down generation
48
 
49
+ - **Temperature**: 0.1-0.7 (default: 0.4)
50
+ - 0.3-0.4: Best for code generation
51
+ - 0.4-0.5: Best for explanations
52
+ - 0.5-0.6: Best for creative writing
53
 
54
+ - **Top P**: 0.5-0.9 (default: 0.8)
55
  - Controls diversity of word choices
56
+ - Lower values = more focused responses
57
+
58
+ ## Performance Notes
59
+
60
+ This is a CPU-optimized version with the following considerations:
61
+ - Responses will be shorter than the GPU version
62
+ - Generation takes longer on CPU (be patient)
63
+ - Memory usage is optimized for CPU environments
64
+ - Best for shorter, focused prompts
65
 
66
  ## Model Links
67
 
 
78
  from transformers import AutoModelForCausalLM, AutoTokenizer
79
  from peft import PeftModel
80
 
81
+ # Load base model and adapter (CPU optimized)
82
+ base_model = AutoModelForCausalLM.from_pretrained(
83
+ "microsoft/phi-2",
84
+ torch_dtype=torch.float32, # Use float32 for CPU
85
+ device_map="cpu",
86
+ low_cpu_mem_usage=True
87
+ )
88
+ model = PeftModel.from_pretrained(
89
+ base_model,
90
+ "pradeep6kumar2024/phi2-qlora-assistant",
91
+ torch_dtype=torch.float32,
92
+ device_map="cpu"
93
+ )
94
  tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
95
 
96
+ # Generate text (CPU optimized)
97
+ prompt = "Write a Python function to calculate factorial"
98
  inputs = tokenizer(prompt, return_tensors="pt")
99
+ outputs = model.generate(
100
+ **inputs,
101
+ max_length=256,
102
+ temperature=0.4,
103
+ top_p=0.8,
104
+ num_beams=1 # Greedy decoding for CPU
105
+ )
106
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
107
  ```
108
 
 
117
  ```
118
 
119
  2. **Technical Explanation**:
120
+ "Machine learning is a branch of artificial intelligence that enables computers to learn from data without being explicitly programmed. It works by analyzing patterns in data and making predictions based on those patterns."
121
 
122
  3. **Professional Writing**:
123
+ "Subject: Team Meeting Request
124
+
125
+ Hi Team,
126
+
127
+ I'd like to schedule a meeting next week to discuss our current project. Please let me know your availability.
128
+
129
+ Thanks,
130
+ [Your Name]"
131
 
132
  ## Limitations
133
 
134
+ - CPU version generates shorter responses than GPU version
135
+ - Generation is slower on CPU environments
136
+ - Works best with clear, concise prompts
137
+ - Memory constraints may limit very complex generations
 
 
 
138
 
139
  ## Acknowledgments
140
 
app.py CHANGED
@@ -240,9 +240,9 @@ demo = gr.Interface(
240
  0.8
241
  ]
242
  ],
243
- cache_examples=False
 
244
  )
245
 
246
  if __name__ == "__main__":
247
- demo.queue(concurrency_count=1) # Limit concurrency
248
- demo.launch()
 
240
  0.8
241
  ]
242
  ],
243
+ cache_examples=False,
244
+ concurrency_limit=1 # Use the correct parameter for limiting concurrency
245
  )
246
 
247
  if __name__ == "__main__":
248
+ demo.launch(max_threads=1) # Limit the number of worker threads