Spaces:
Sleeping
Sleeping
Commit
·
1494734
1
Parent(s):
aba5997
added files for hugginface cli
Browse files- README.md +90 -10
- app.py +164 -0
- requirements.txt +18 -0
README.md
CHANGED
@@ -1,14 +1,94 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
license: mit
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: en
|
3 |
+
tags:
|
4 |
+
- phi-2
|
5 |
+
- qlora
|
6 |
+
- fine-tuning
|
7 |
+
- assistant
|
8 |
+
- coding
|
9 |
+
- writing
|
10 |
license: mit
|
11 |
+
datasets:
|
12 |
+
- custom
|
13 |
+
model-index:
|
14 |
+
- name: phi2-qlora-assistant
|
15 |
+
results:
|
16 |
+
- task: text-generation
|
17 |
+
type: text-generation
|
18 |
+
metrics:
|
19 |
+
- name: accuracy
|
20 |
+
type: accuracy
|
21 |
+
value: N/A
|
22 |
---
|
23 |
|
24 |
+
# Phi-2 QLoRA Fine-tuned Assistant
|
25 |
+
|
26 |
+
This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA (Quantized Low-Rank Adaptation) technique. The model has been trained to excel at various tasks including coding, technical explanations, and professional writing.
|
27 |
+
|
28 |
+
## Model Description
|
29 |
+
|
30 |
+
- **Base Model**: Microsoft Phi-2
|
31 |
+
- **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
|
32 |
+
- **Training Data**: Custom dataset focused on coding, technical explanations, and professional communication
|
33 |
+
- **Primary Use Cases**: Code generation, technical writing, and professional communication
|
34 |
+
|
35 |
+
## Example Usage
|
36 |
+
|
37 |
+
```python
|
38 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
39 |
+
from peft import PeftModel
|
40 |
+
|
41 |
+
# Load base model and adapter
|
42 |
+
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
|
43 |
+
model = PeftModel.from_pretrained(base_model, "pradeep6kumar2024/phi2-qlora-assistant")
|
44 |
+
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
|
45 |
+
|
46 |
+
# Generate text
|
47 |
+
prompt = "Write a Python function to calculate the factorial of a number"
|
48 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
49 |
+
outputs = model.generate(**inputs, max_length=512)
|
50 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
51 |
+
```
|
52 |
+
|
53 |
+
## Example Outputs
|
54 |
+
|
55 |
+
1. **Coding Task**:
|
56 |
+
```python
|
57 |
+
def factorial(n):
|
58 |
+
if n == 0 or n == 1:
|
59 |
+
return 1
|
60 |
+
return n * factorial(n-1)
|
61 |
+
```
|
62 |
+
|
63 |
+
2. **Technical Explanation**:
|
64 |
+
"Machine learning is a branch of artificial intelligence that enables computers to learn from data without being explicitly programmed. Think of it like teaching a child - instead of giving them strict rules, you show them examples and they learn to recognize patterns..."
|
65 |
+
|
66 |
+
3. **Professional Writing**:
|
67 |
+
"Dear Team,
|
68 |
+
I hope this email finds you well. I would like to schedule a team meeting next week to discuss our project progress..."
|
69 |
+
|
70 |
+
## Parameters
|
71 |
+
|
72 |
+
- **Temperature**: Controls creativity (0.3-0.5 for code, 0.7-0.9 for writing)
|
73 |
+
- **Max Length**: Adjustable based on desired response length (64-1024)
|
74 |
+
- **Top P**: Controls response diversity (recommended: 0.9)
|
75 |
+
|
76 |
+
## Limitations
|
77 |
+
|
78 |
+
- The model works best with clear, well-structured prompts
|
79 |
+
- Code generation is optimized for Python but can handle other languages
|
80 |
+
- Response quality may vary with very long or complex prompts
|
81 |
+
|
82 |
+
## Try It Out
|
83 |
+
|
84 |
+
You can try this model directly in your browser using our Gradio Space: [Phi2-QLoRA-Assistant Demo](https://huggingface.co/spaces/pradeep6kumar2024/phi2-qlora-assistant-demo)
|
85 |
+
|
86 |
+
## License
|
87 |
+
|
88 |
+
This model is released under the MIT License.
|
89 |
+
|
90 |
+
## Acknowledgments
|
91 |
+
|
92 |
+
- Microsoft for the Phi-2 base model
|
93 |
+
- Hugging Face for the transformers library and hosting
|
94 |
+
- The QLoRA paper authors for the fine-tuning technique
|
app.py
ADDED
@@ -0,0 +1,164 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import torch
|
3 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
4 |
+
from peft import PeftModel
|
5 |
+
import time
|
6 |
+
|
7 |
+
# Configuration
|
8 |
+
BASE_MODEL = "microsoft/phi-2"
|
9 |
+
ADAPTER_MODEL = "pradeep6kumar2024/phi2-qlora-assistant" # Your actual model ID
|
10 |
+
|
11 |
+
class ModelWrapper:
|
12 |
+
def __init__(self):
|
13 |
+
self.model = None
|
14 |
+
self.tokenizer = None
|
15 |
+
self.loaded = False
|
16 |
+
|
17 |
+
def load_model(self):
|
18 |
+
if not self.loaded:
|
19 |
+
print("Loading model and tokenizer...")
|
20 |
+
self.tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
|
21 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
22 |
+
BASE_MODEL,
|
23 |
+
torch_dtype=torch.float16,
|
24 |
+
device_map="auto",
|
25 |
+
trust_remote_code=True
|
26 |
+
)
|
27 |
+
|
28 |
+
print("Loading LoRA adapter...")
|
29 |
+
self.model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)
|
30 |
+
self.loaded = True
|
31 |
+
print("Model loading complete!")
|
32 |
+
|
33 |
+
def generate_response(self, prompt, max_length=512, temperature=0.7, top_p=0.9, stream=False):
|
34 |
+
if not self.loaded:
|
35 |
+
self.load_model()
|
36 |
+
|
37 |
+
# Tokenize input
|
38 |
+
inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
|
39 |
+
inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
|
40 |
+
|
41 |
+
# Generate
|
42 |
+
start_time = time.time()
|
43 |
+
with torch.no_grad():
|
44 |
+
outputs = self.model.generate(
|
45 |
+
**inputs,
|
46 |
+
max_length=max_length,
|
47 |
+
temperature=temperature,
|
48 |
+
top_p=top_p,
|
49 |
+
do_sample=True,
|
50 |
+
pad_token_id=self.tokenizer.pad_token_id,
|
51 |
+
eos_token_id=self.tokenizer.eos_token_id
|
52 |
+
)
|
53 |
+
|
54 |
+
# Decode response
|
55 |
+
response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
56 |
+
if response.startswith(prompt):
|
57 |
+
response = response[len(prompt):].strip()
|
58 |
+
|
59 |
+
generation_time = time.time() - start_time
|
60 |
+
return response, generation_time
|
61 |
+
|
62 |
+
# Initialize model wrapper
|
63 |
+
model_wrapper = ModelWrapper()
|
64 |
+
|
65 |
+
def generate_text(prompt, max_length=512, temperature=0.7, top_p=0.9):
|
66 |
+
"""Gradio interface function"""
|
67 |
+
try:
|
68 |
+
response, gen_time = model_wrapper.generate_response(
|
69 |
+
prompt,
|
70 |
+
max_length=max_length,
|
71 |
+
temperature=temperature,
|
72 |
+
top_p=top_p
|
73 |
+
)
|
74 |
+
return f"Generated in {gen_time:.2f} seconds:\n\n{response}"
|
75 |
+
except Exception as e:
|
76 |
+
return f"Error generating response: {str(e)}"
|
77 |
+
|
78 |
+
# Create the Gradio interface
|
79 |
+
demo = gr.Interface(
|
80 |
+
fn=generate_text,
|
81 |
+
inputs=[
|
82 |
+
gr.Textbox(
|
83 |
+
label="Enter your prompt",
|
84 |
+
placeholder="Type your prompt here...",
|
85 |
+
lines=4
|
86 |
+
),
|
87 |
+
gr.Slider(
|
88 |
+
minimum=64,
|
89 |
+
maximum=1024,
|
90 |
+
value=512,
|
91 |
+
step=64,
|
92 |
+
label="Maximum Length",
|
93 |
+
info="Longer values = longer responses but slower generation"
|
94 |
+
),
|
95 |
+
gr.Slider(
|
96 |
+
minimum=0.1,
|
97 |
+
maximum=1.0,
|
98 |
+
value=0.7,
|
99 |
+
step=0.1,
|
100 |
+
label="Temperature",
|
101 |
+
info="Higher values = more creative, lower values = more focused"
|
102 |
+
),
|
103 |
+
gr.Slider(
|
104 |
+
minimum=0.1,
|
105 |
+
maximum=1.0,
|
106 |
+
value=0.9,
|
107 |
+
step=0.1,
|
108 |
+
label="Top P",
|
109 |
+
info="Controls diversity of word choices"
|
110 |
+
),
|
111 |
+
],
|
112 |
+
outputs=gr.Textbox(label="Generated Response", lines=8),
|
113 |
+
title="Phi-2 QLoRA Fine-tuned Assistant",
|
114 |
+
description="""This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA.
|
115 |
+
The model has been trained to provide helpful responses for various tasks including coding, writing, and general assistance.
|
116 |
+
|
117 |
+
Example tasks:
|
118 |
+
- Writing Python functions and explaining code
|
119 |
+
- Explaining technical concepts in simple terms
|
120 |
+
- Drafting professional emails and documents
|
121 |
+
|
122 |
+
Tips:
|
123 |
+
- For code generation, use lower temperature (0.3-0.5)
|
124 |
+
- For creative writing, use higher temperature (0.7-0.9)
|
125 |
+
- Adjust max length based on how long you want the response to be
|
126 |
+
""",
|
127 |
+
examples=[
|
128 |
+
[
|
129 |
+
"Write a Python function to calculate the factorial of a number and provide additional recursive function examples",
|
130 |
+
512,
|
131 |
+
0.5,
|
132 |
+
0.9
|
133 |
+
],
|
134 |
+
[
|
135 |
+
"Explain what machine learning is in simple terms and provide some real-world applications",
|
136 |
+
512,
|
137 |
+
0.7,
|
138 |
+
0.9
|
139 |
+
],
|
140 |
+
[
|
141 |
+
"Write a professional email to schedule a team meeting for next week to discuss project progress",
|
142 |
+
512,
|
143 |
+
0.7,
|
144 |
+
0.9
|
145 |
+
],
|
146 |
+
[
|
147 |
+
"Write a Python function to implement binary search algorithm with detailed comments",
|
148 |
+
512,
|
149 |
+
0.5,
|
150 |
+
0.9
|
151 |
+
],
|
152 |
+
[
|
153 |
+
"Explain the concept of object-oriented programming using a real-world analogy",
|
154 |
+
512,
|
155 |
+
0.7,
|
156 |
+
0.9
|
157 |
+
]
|
158 |
+
],
|
159 |
+
cache_examples=False
|
160 |
+
)
|
161 |
+
|
162 |
+
# Launch with sharing enabled
|
163 |
+
if __name__ == "__main__":
|
164 |
+
demo.launch()
|
requirements.txt
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio==4.19.2
|
2 |
+
torch>=2.0.0
|
3 |
+
transformers>=4.36.0
|
4 |
+
peft>=0.7.0
|
5 |
+
accelerate>=0.25.0
|
6 |
+
bitsandbytes>=0.41.0
|
7 |
+
safetensors>=0.4.0
|
8 |
+
datasets>=2.14.0
|
9 |
+
wandb>=0.15.10
|
10 |
+
sentencepiece>=0.1.99
|
11 |
+
einops>=0.6.1
|
12 |
+
scipy>=1.11.3
|
13 |
+
tqdm>=4.66.1
|
14 |
+
huggingface_hub>=0.17.3
|
15 |
+
pandas>=2.0.0
|
16 |
+
numpy>=1.24.0
|
17 |
+
rouge-score>=0.1.2
|
18 |
+
nltk>=3.8.1
|