Spaces:

pradeep6kumar2024
/

QLORA_phi2

Sleeping

App Files Files Community

pradeep6kumar2024 commited on Mar 3

Commit

1494734

1 Parent(s): aba5997

added files for hugginface cli

Browse files

Files changed (3) hide show

README.md +90 -10
app.py +164 -0
requirements.txt +18 -0

README.md CHANGED Viewed

@@ -1,14 +1,94 @@
 ---
-title: QLORA Phi2
-emoji: 📉
-colorFrom: blue
-colorTo: pink
-sdk: gradio
-sdk_version: 5.20.0
-app_file: app.py
-pinned: false
 license: mit
-short_description: This is a fine-tuned version of Microsoft's Phi-2 model
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+language: en
+tags:
+- phi-2
+- qlora
+- fine-tuning
+- assistant
+- coding
+- writing
 license: mit
+datasets:
+- custom
+model-index:
+- name: phi2-qlora-assistant
+  results:
+  - task: text-generation
+    type: text-generation
+    metrics:
+    - name: accuracy
+      type: accuracy
+      value: N/A
 ---
+# Phi-2 QLoRA Fine-tuned Assistant
+This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA (Quantized Low-Rank Adaptation) technique. The model has been trained to excel at various tasks including coding, technical explanations, and professional writing.
+## Model Description
+- **Base Model**: Microsoft Phi-2
+- **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
+- **Training Data**: Custom dataset focused on coding, technical explanations, and professional communication
+- **Primary Use Cases**: Code generation, technical writing, and professional communication
+## Example Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+# Load base model and adapter
+base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
+model = PeftModel.from_pretrained(base_model, "pradeep6kumar2024/phi2-qlora-assistant")
+tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
+# Generate text
+prompt = "Write a Python function to calculate the factorial of a number"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=512)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+## Example Outputs
+1. **Coding Task**:
+   ```python
+   def factorial(n):
+       if n == 0 or n == 1:
+           return 1
+       return n * factorial(n-1)
+   ```
+2. **Technical Explanation**:
+   "Machine learning is a branch of artificial intelligence that enables computers to learn from data without being explicitly programmed. Think of it like teaching a child - instead of giving them strict rules, you show them examples and they learn to recognize patterns..."
+3. **Professional Writing**:
+   "Dear Team,
+   I hope this email finds you well. I would like to schedule a team meeting next week to discuss our project progress..."
+## Parameters
+- **Temperature**: Controls creativity (0.3-0.5 for code, 0.7-0.9 for writing)
+- **Max Length**: Adjustable based on desired response length (64-1024)
+- **Top P**: Controls response diversity (recommended: 0.9)
+## Limitations
+- The model works best with clear, well-structured prompts
+- Code generation is optimized for Python but can handle other languages
+- Response quality may vary with very long or complex prompts
+## Try It Out
+You can try this model directly in your browser using our Gradio Space: [Phi2-QLoRA-Assistant Demo](https://huggingface.co/spaces/pradeep6kumar2024/phi2-qlora-assistant-demo)
+## License
+This model is released under the MIT License.
+## Acknowledgments
+- Microsoft for the Phi-2 base model
+- Hugging Face for the transformers library and hosting
+- The QLoRA paper authors for the fine-tuning technique

app.py ADDED Viewed

	@@ -0,0 +1,164 @@

+import gradio as gr
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import time
+# Configuration
+BASE_MODEL = "microsoft/phi-2"
+ADAPTER_MODEL = "pradeep6kumar2024/phi2-qlora-assistant"  # Your actual model ID
+class ModelWrapper:
+    def __init__(self):
+        self.model = None
+        self.tokenizer = None
+        self.loaded = False
+    def load_model(self):
+        if not self.loaded:
+            print("Loading model and tokenizer...")
+            self.tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
+            base_model = AutoModelForCausalLM.from_pretrained(
+                BASE_MODEL,
+                torch_dtype=torch.float16,
+                device_map="auto",
+                trust_remote_code=True
+            )
+            print("Loading LoRA adapter...")
+            self.model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)
+            self.loaded = True
+            print("Model loading complete!")
+    def generate_response(self, prompt, max_length=512, temperature=0.7, top_p=0.9, stream=False):
+        if not self.loaded:
+            self.load_model()
+        # Tokenize input
+        inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
+        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
+        # Generate
+        start_time = time.time()
+        with torch.no_grad():
+            outputs = self.model.generate(
+                **inputs,
+                max_length=max_length,
+                temperature=temperature,
+                top_p=top_p,
+                do_sample=True,
+                pad_token_id=self.tokenizer.pad_token_id,
+                eos_token_id=self.tokenizer.eos_token_id
+            )
+        # Decode response
+        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
+        if response.startswith(prompt):
+            response = response[len(prompt):].strip()
+        generation_time = time.time() - start_time
+        return response, generation_time
+# Initialize model wrapper
+model_wrapper = ModelWrapper()
+def generate_text(prompt, max_length=512, temperature=0.7, top_p=0.9):
+    """Gradio interface function"""
+    try:
+        response, gen_time = model_wrapper.generate_response(
+            prompt,
+            max_length=max_length,
+            temperature=temperature,
+            top_p=top_p
+        )
+        return f"Generated in {gen_time:.2f} seconds:\n\n{response}"
+    except Exception as e:
+        return f"Error generating response: {str(e)}"
+# Create the Gradio interface
+demo = gr.Interface(
+    fn=generate_text,
+    inputs=[
+        gr.Textbox(
+            label="Enter your prompt",
+            placeholder="Type your prompt here...",
+            lines=4
+        ),
+        gr.Slider(
+            minimum=64,
+            maximum=1024,
+            value=512,
+            step=64,
+            label="Maximum Length",
+            info="Longer values = longer responses but slower generation"
+        ),
+        gr.Slider(
+            minimum=0.1,
+            maximum=1.0,
+            value=0.7,
+            step=0.1,
+            label="Temperature",
+            info="Higher values = more creative, lower values = more focused"
+        ),
+        gr.Slider(
+            minimum=0.1,
+            maximum=1.0,
+            value=0.9,
+            step=0.1,
+            label="Top P",
+            info="Controls diversity of word choices"
+        ),
+    ],
+    outputs=gr.Textbox(label="Generated Response", lines=8),
+    title="Phi-2 QLoRA Fine-tuned Assistant",
+    description="""This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA.
+    The model has been trained to provide helpful responses for various tasks including coding, writing, and general assistance.
+    Example tasks:
+    - Writing Python functions and explaining code
+    - Explaining technical concepts in simple terms
+    - Drafting professional emails and documents
+    Tips:
+    - For code generation, use lower temperature (0.3-0.5)
+    - For creative writing, use higher temperature (0.7-0.9)
+    - Adjust max length based on how long you want the response to be
+    """,
+    examples=[
+        [
+            "Write a Python function to calculate the factorial of a number and provide additional recursive function examples",
+            512,
+            0.5,
+            0.9
+        ],
+        [
+            "Explain what machine learning is in simple terms and provide some real-world applications",
+            512,
+            0.7,
+            0.9
+        ],
+        [
+            "Write a professional email to schedule a team meeting for next week to discuss project progress",
+            512,
+            0.7,
+            0.9
+        ],
+        [
+            "Write a Python function to implement binary search algorithm with detailed comments",
+            512,
+            0.5,
+            0.9
+        ],
+        [
+            "Explain the concept of object-oriented programming using a real-world analogy",
+            512,
+            0.7,
+            0.9
+        ]
+    ],
+    cache_examples=False
+)
+# Launch with sharing enabled
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+gradio==4.19.2
+torch>=2.0.0
+transformers>=4.36.0
+peft>=0.7.0
+accelerate>=0.25.0
+bitsandbytes>=0.41.0
+safetensors>=0.4.0
+datasets>=2.14.0
+wandb>=0.15.10
+sentencepiece>=0.1.99
+einops>=0.6.1
+scipy>=1.11.3
+tqdm>=4.66.1
+huggingface_hub>=0.17.3
+pandas>=2.0.0
+numpy>=1.24.0
+rouge-score>=0.1.2
+nltk>=3.8.1