Spaces:

pradeep6kumar2024
/

QLORA_phi2

Sleeping

App Files Files Community

pradeep6kumar2024 commited on Mar 3

Commit

3cbb88b

1 Parent(s): a27324e

added readme and app.py

Browse files

Files changed (2) hide show

README.md +63 -36
app.py +3 -3

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Phi-2 QLoRA Assistant Demo
 emoji: 🤖
 colorFrom: blue
 colorTo: purple
@@ -9,51 +9,59 @@ app_file: app.py
 pinned: false
 ---
-# Phi-2 QLoRA Fine-tuned Assistant
-This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA (Quantized Low-Rank Adaptation) technique. The model has been trained to excel at various tasks including coding, technical explanations, and professional writing.
 ## Model Description
 - **Base Model**: Microsoft Phi-2
 - **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
-- **Training Data**: Custom dataset focused on coding, technical explanations, and professional communication
-- **Primary Use Cases**: Code generation, technical writing, and professional communication
 ## Usage Tips
 ### For Code Generation (Temperature: 0.3-0.5)
 ```python
 # Example prompt:
-"Write a Python function to calculate the factorial of a number and provide additional recursive function examples"
 ```
-### For Technical Explanations (Temperature: 0.7)
 ```text
 # Example prompt:
-"Explain what machine learning is in simple terms and provide some real-world applications"
 ```
-### For Professional Writing (Temperature: 0.7-0.9)
 ```text
 # Example prompt:
-"Write a professional email to schedule a team meeting for next week to discuss project progress"
 ```
-## Parameters Guide
-- **Maximum Length**: 64-1024 (default: 512)
-  - Increase for longer responses
-  - Decrease for quicker, more concise responses
-- **Temperature**: 0.1-1.0 (default: 0.7)
-  - 0.3-0.5: Best for code generation
-  - 0.7-0.9: Best for creative writing
-  - 1.0: Maximum creativity
-- **Top P**: 0.1-1.0 (default: 0.9)
   - Controls diversity of word choices
-  - Higher values = more diverse vocabulary
 ## Model Links
@@ -70,15 +78,31 @@ This demo is released under the MIT License.
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import PeftModel
-# Load base model and adapter
-base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
-model = PeftModel.from_pretrained(base_model, "pradeep6kumar2024/phi2-qlora-assistant")
 tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
-# Generate text
-prompt = "Write a Python function to calculate the factorial of a number"
 inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(**inputs, max_length=512)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 ```
@@ -93,21 +117,24 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    ```
 2. **Technical Explanation**:
-   "Machine learning is a branch of artificial intelligence that enables computers to learn from data without being explicitly programmed. Think of it like teaching a child - instead of giving them strict rules, you show them examples and they learn to recognize patterns..."
 3. **Professional Writing**:
-   "Dear Team,
-   I hope this email finds you well. I would like to schedule a team meeting next week to discuss our project progress..."
 ## Limitations
-- The model works best with clear, well-structured prompts
-- Code generation is optimized for Python but can handle other languages
-- Response quality may vary with very long or complex prompts
-## Try It Out
-You can try this model directly in your browser using our Gradio Space: [Phi2-QLoRA-Assistant Demo](https://huggingface.co/spaces/pradeep6kumar2024/phi2-qlora-assistant-demo)
 ## Acknowledgments

 ---
+title: Phi-2 QLoRA Assistant Demo (CPU-Optimized)
 emoji: 🤖
 colorFrom: blue
 colorTo: purple
 pinned: false
 ---
+# Phi-2 QLoRA Fine-tuned Assistant (CPU-Optimized)
+This is a lightweight CPU-optimized version of Microsoft's Phi-2 model fine-tuned using QLoRA (Quantized Low-Rank Adaptation) technique. The model has been optimized to run efficiently on CPU environments while still providing helpful responses for coding, explanations, and writing tasks.
 ## Model Description
 - **Base Model**: Microsoft Phi-2
 - **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
+- **Optimization**: CPU-optimized with reduced parameters
+- **Primary Use Cases**: Code generation, technical explanations, and professional writing
 ## Usage Tips
 ### For Code Generation (Temperature: 0.3-0.5)
 ```python
 # Example prompt:
+"Write a Python function to calculate factorial"
 ```
+### For Technical Explanations (Temperature: 0.4-0.5)
 ```text
 # Example prompt:
+"Explain machine learning simply"
 ```
+### For Professional Writing (Temperature: 0.4-0.6)
 ```text
 # Example prompt:
+"Write a short email to schedule a meeting"
 ```
+## Parameters Guide (CPU-Optimized)
+- **Maximum Length**: 64-256 (default: 192)
+  - Keep this low (128-192) for faster responses on CPU
+  - Higher values will significantly slow down generation
+- **Temperature**: 0.1-0.7 (default: 0.4)
+  - 0.3-0.4: Best for code generation
+  - 0.4-0.5: Best for explanations
+  - 0.5-0.6: Best for creative writing
+- **Top P**: 0.5-0.9 (default: 0.8)
   - Controls diversity of word choices
+  - Lower values = more focused responses
+## Performance Notes
+This is a CPU-optimized version with the following considerations:
+- Responses will be shorter than the GPU version
+- Generation takes longer on CPU (be patient)
+- Memory usage is optimized for CPU environments
+- Best for shorter, focused prompts
 ## Model Links
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import PeftModel
+# Load base model and adapter (CPU optimized)
+base_model = AutoModelForCausalLM.from_pretrained(
+    "microsoft/phi-2",
+    torch_dtype=torch.float32,  # Use float32 for CPU
+    device_map="cpu",
+    low_cpu_mem_usage=True
+)
+model = PeftModel.from_pretrained(
+    base_model,
+    "pradeep6kumar2024/phi2-qlora-assistant",
+    torch_dtype=torch.float32,
+    device_map="cpu"
+)
 tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
+# Generate text (CPU optimized)
+prompt = "Write a Python function to calculate factorial"
 inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(
+    **inputs,
+    max_length=256,
+    temperature=0.4,
+    top_p=0.8,
+    num_beams=1  # Greedy decoding for CPU
+)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 ```
    ```
 2. **Technical Explanation**:
+   "Machine learning is a branch of artificial intelligence that enables computers to learn from data without being explicitly programmed. It works by analyzing patterns in data and making predictions based on those patterns."
 3. **Professional Writing**:
+   "Subject: Team Meeting Request
+   Hi Team,
+   I'd like to schedule a meeting next week to discuss our current project. Please let me know your availability.
+   Thanks,
+   [Your Name]"
 ## Limitations
+- CPU version generates shorter responses than GPU version
+- Generation is slower on CPU environments
+- Works best with clear, concise prompts
+- Memory constraints may limit very complex generations
 ## Acknowledgments

app.py CHANGED Viewed

@@ -240,9 +240,9 @@ demo = gr.Interface(
             0.8
         ]
     ],
-    cache_examples=False
 )
 if __name__ == "__main__":
-    demo.queue(concurrency_count=1)  # Limit concurrency
-    demo.launch()

             0.8
         ]
     ],
+    cache_examples=False,
+    concurrency_limit=1  # Use the correct parameter for limiting concurrency
 )
 if __name__ == "__main__":
+    demo.launch(max_threads=1)  # Limit the number of worker threads