pradeep6kumar2024 commited on
Commit
1494734
·
1 Parent(s): aba5997

added files for hugginface cli

Browse files
Files changed (3) hide show
  1. README.md +90 -10
  2. app.py +164 -0
  3. requirements.txt +18 -0
README.md CHANGED
@@ -1,14 +1,94 @@
1
  ---
2
- title: QLORA Phi2
3
- emoji: 📉
4
- colorFrom: blue
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 5.20.0
8
- app_file: app.py
9
- pinned: false
10
  license: mit
11
- short_description: This is a fine-tuned version of Microsoft's Phi-2 model
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ tags:
4
+ - phi-2
5
+ - qlora
6
+ - fine-tuning
7
+ - assistant
8
+ - coding
9
+ - writing
10
  license: mit
11
+ datasets:
12
+ - custom
13
+ model-index:
14
+ - name: phi2-qlora-assistant
15
+ results:
16
+ - task: text-generation
17
+ type: text-generation
18
+ metrics:
19
+ - name: accuracy
20
+ type: accuracy
21
+ value: N/A
22
  ---
23
 
24
+ # Phi-2 QLoRA Fine-tuned Assistant
25
+
26
+ This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA (Quantized Low-Rank Adaptation) technique. The model has been trained to excel at various tasks including coding, technical explanations, and professional writing.
27
+
28
+ ## Model Description
29
+
30
+ - **Base Model**: Microsoft Phi-2
31
+ - **Training Method**: QLoRA (Quantized Low-Rank Adaptation)
32
+ - **Training Data**: Custom dataset focused on coding, technical explanations, and professional communication
33
+ - **Primary Use Cases**: Code generation, technical writing, and professional communication
34
+
35
+ ## Example Usage
36
+
37
+ ```python
38
+ from transformers import AutoModelForCausalLM, AutoTokenizer
39
+ from peft import PeftModel
40
+
41
+ # Load base model and adapter
42
+ base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
43
+ model = PeftModel.from_pretrained(base_model, "pradeep6kumar2024/phi2-qlora-assistant")
44
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
45
+
46
+ # Generate text
47
+ prompt = "Write a Python function to calculate the factorial of a number"
48
+ inputs = tokenizer(prompt, return_tensors="pt")
49
+ outputs = model.generate(**inputs, max_length=512)
50
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
51
+ ```
52
+
53
+ ## Example Outputs
54
+
55
+ 1. **Coding Task**:
56
+ ```python
57
+ def factorial(n):
58
+ if n == 0 or n == 1:
59
+ return 1
60
+ return n * factorial(n-1)
61
+ ```
62
+
63
+ 2. **Technical Explanation**:
64
+ "Machine learning is a branch of artificial intelligence that enables computers to learn from data without being explicitly programmed. Think of it like teaching a child - instead of giving them strict rules, you show them examples and they learn to recognize patterns..."
65
+
66
+ 3. **Professional Writing**:
67
+ "Dear Team,
68
+ I hope this email finds you well. I would like to schedule a team meeting next week to discuss our project progress..."
69
+
70
+ ## Parameters
71
+
72
+ - **Temperature**: Controls creativity (0.3-0.5 for code, 0.7-0.9 for writing)
73
+ - **Max Length**: Adjustable based on desired response length (64-1024)
74
+ - **Top P**: Controls response diversity (recommended: 0.9)
75
+
76
+ ## Limitations
77
+
78
+ - The model works best with clear, well-structured prompts
79
+ - Code generation is optimized for Python but can handle other languages
80
+ - Response quality may vary with very long or complex prompts
81
+
82
+ ## Try It Out
83
+
84
+ You can try this model directly in your browser using our Gradio Space: [Phi2-QLoRA-Assistant Demo](https://huggingface.co/spaces/pradeep6kumar2024/phi2-qlora-assistant-demo)
85
+
86
+ ## License
87
+
88
+ This model is released under the MIT License.
89
+
90
+ ## Acknowledgments
91
+
92
+ - Microsoft for the Phi-2 base model
93
+ - Hugging Face for the transformers library and hosting
94
+ - The QLoRA paper authors for the fine-tuning technique
app.py ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ from transformers import AutoModelForCausalLM, AutoTokenizer
4
+ from peft import PeftModel
5
+ import time
6
+
7
+ # Configuration
8
+ BASE_MODEL = "microsoft/phi-2"
9
+ ADAPTER_MODEL = "pradeep6kumar2024/phi2-qlora-assistant" # Your actual model ID
10
+
11
+ class ModelWrapper:
12
+ def __init__(self):
13
+ self.model = None
14
+ self.tokenizer = None
15
+ self.loaded = False
16
+
17
+ def load_model(self):
18
+ if not self.loaded:
19
+ print("Loading model and tokenizer...")
20
+ self.tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
21
+ base_model = AutoModelForCausalLM.from_pretrained(
22
+ BASE_MODEL,
23
+ torch_dtype=torch.float16,
24
+ device_map="auto",
25
+ trust_remote_code=True
26
+ )
27
+
28
+ print("Loading LoRA adapter...")
29
+ self.model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)
30
+ self.loaded = True
31
+ print("Model loading complete!")
32
+
33
+ def generate_response(self, prompt, max_length=512, temperature=0.7, top_p=0.9, stream=False):
34
+ if not self.loaded:
35
+ self.load_model()
36
+
37
+ # Tokenize input
38
+ inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
39
+ inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
40
+
41
+ # Generate
42
+ start_time = time.time()
43
+ with torch.no_grad():
44
+ outputs = self.model.generate(
45
+ **inputs,
46
+ max_length=max_length,
47
+ temperature=temperature,
48
+ top_p=top_p,
49
+ do_sample=True,
50
+ pad_token_id=self.tokenizer.pad_token_id,
51
+ eos_token_id=self.tokenizer.eos_token_id
52
+ )
53
+
54
+ # Decode response
55
+ response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
56
+ if response.startswith(prompt):
57
+ response = response[len(prompt):].strip()
58
+
59
+ generation_time = time.time() - start_time
60
+ return response, generation_time
61
+
62
+ # Initialize model wrapper
63
+ model_wrapper = ModelWrapper()
64
+
65
+ def generate_text(prompt, max_length=512, temperature=0.7, top_p=0.9):
66
+ """Gradio interface function"""
67
+ try:
68
+ response, gen_time = model_wrapper.generate_response(
69
+ prompt,
70
+ max_length=max_length,
71
+ temperature=temperature,
72
+ top_p=top_p
73
+ )
74
+ return f"Generated in {gen_time:.2f} seconds:\n\n{response}"
75
+ except Exception as e:
76
+ return f"Error generating response: {str(e)}"
77
+
78
+ # Create the Gradio interface
79
+ demo = gr.Interface(
80
+ fn=generate_text,
81
+ inputs=[
82
+ gr.Textbox(
83
+ label="Enter your prompt",
84
+ placeholder="Type your prompt here...",
85
+ lines=4
86
+ ),
87
+ gr.Slider(
88
+ minimum=64,
89
+ maximum=1024,
90
+ value=512,
91
+ step=64,
92
+ label="Maximum Length",
93
+ info="Longer values = longer responses but slower generation"
94
+ ),
95
+ gr.Slider(
96
+ minimum=0.1,
97
+ maximum=1.0,
98
+ value=0.7,
99
+ step=0.1,
100
+ label="Temperature",
101
+ info="Higher values = more creative, lower values = more focused"
102
+ ),
103
+ gr.Slider(
104
+ minimum=0.1,
105
+ maximum=1.0,
106
+ value=0.9,
107
+ step=0.1,
108
+ label="Top P",
109
+ info="Controls diversity of word choices"
110
+ ),
111
+ ],
112
+ outputs=gr.Textbox(label="Generated Response", lines=8),
113
+ title="Phi-2 QLoRA Fine-tuned Assistant",
114
+ description="""This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA.
115
+ The model has been trained to provide helpful responses for various tasks including coding, writing, and general assistance.
116
+
117
+ Example tasks:
118
+ - Writing Python functions and explaining code
119
+ - Explaining technical concepts in simple terms
120
+ - Drafting professional emails and documents
121
+
122
+ Tips:
123
+ - For code generation, use lower temperature (0.3-0.5)
124
+ - For creative writing, use higher temperature (0.7-0.9)
125
+ - Adjust max length based on how long you want the response to be
126
+ """,
127
+ examples=[
128
+ [
129
+ "Write a Python function to calculate the factorial of a number and provide additional recursive function examples",
130
+ 512,
131
+ 0.5,
132
+ 0.9
133
+ ],
134
+ [
135
+ "Explain what machine learning is in simple terms and provide some real-world applications",
136
+ 512,
137
+ 0.7,
138
+ 0.9
139
+ ],
140
+ [
141
+ "Write a professional email to schedule a team meeting for next week to discuss project progress",
142
+ 512,
143
+ 0.7,
144
+ 0.9
145
+ ],
146
+ [
147
+ "Write a Python function to implement binary search algorithm with detailed comments",
148
+ 512,
149
+ 0.5,
150
+ 0.9
151
+ ],
152
+ [
153
+ "Explain the concept of object-oriented programming using a real-world analogy",
154
+ 512,
155
+ 0.7,
156
+ 0.9
157
+ ]
158
+ ],
159
+ cache_examples=False
160
+ )
161
+
162
+ # Launch with sharing enabled
163
+ if __name__ == "__main__":
164
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio==4.19.2
2
+ torch>=2.0.0
3
+ transformers>=4.36.0
4
+ peft>=0.7.0
5
+ accelerate>=0.25.0
6
+ bitsandbytes>=0.41.0
7
+ safetensors>=0.4.0
8
+ datasets>=2.14.0
9
+ wandb>=0.15.10
10
+ sentencepiece>=0.1.99
11
+ einops>=0.6.1
12
+ scipy>=1.11.3
13
+ tqdm>=4.66.1
14
+ huggingface_hub>=0.17.3
15
+ pandas>=2.0.0
16
+ numpy>=1.24.0
17
+ rouge-score>=0.1.2
18
+ nltk>=3.8.1