Update README.md
Browse files
README.md
CHANGED
@@ -30,16 +30,7 @@ Example 1:
|
|
30 |
|
31 |
**llama-2-7b-dolphin-peft**:
|
32 |
|
33 |
-
|
34 |
-
> 2. Take a stroll through Central Park and enjoy its many attractions.
|
35 |
-
> 3. Explore the world-renowned museums, such as the Metropolitan Museum of Art and the Museum of Modern Art.
|
36 |
-
> 4. Experience the vibrant energy of Times Square and take in the bright lights and billboards.
|
37 |
-
> 5. Visit the 9/11 Memorial and Museum to pay tribute to those who lost their lives in the attacks.
|
38 |
-
> 6. Enjoy a Broadway show or a concert at one of the many theaters and venues in the city.
|
39 |
-
> 7. Take a ride on the Staten Island Ferry for a free view of the Statue of Liberty and the New York City skyline.
|
40 |
-
> 8. Shop at the famous Fifth Avenue stores and explore the high-end boutiques.
|
41 |
-
> 9. Indulge in a variety of cuisines at one of the many restaurants in the city.
|
42 |
-
> 10. Visit the Empire State Building and enjoy the panoramic views of the city from the observation deck.
|
43 |
|
44 |
<br>
|
45 |
|
@@ -50,18 +41,7 @@ Example 2:
|
|
50 |
|
51 |
**llama-2-7b-dolphin-peft**:
|
52 |
|
53 |
-
|
54 |
-
>
|
55 |
-
> Dear Friends,
|
56 |
-
>
|
57 |
-
> I hope this email finds you well. I'm excited to invite you all to a dinner party on Friday, March 10th, at 7:00 PM. The address is 123 Main Street, Anytown, USA.
|
58 |
-
>
|
59 |
-
> Please RSVP by Wednesday, March 8th, so I can plan accordingly. I look forward to seeing you all and sharing a delicious meal together!
|
60 |
-
>
|
61 |
-
> Best,
|
62 |
-
> Your Friendly Assistant
|
63 |
-
>
|
64 |
-
> P.S. If you have any dietary restrictions or allergies, please let me know in your RSVP. Thank you!
|
65 |
|
66 |
<br>
|
67 |
|
@@ -82,8 +62,6 @@ The llama-2-7b models have been modified from a standard transformer in the foll
|
|
82 |
|
83 |
## Finetuning Description
|
84 |
|
85 |
-
This model was trained on a single A6000 (48 GB) for about 18 hours using the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
|
86 |
-
|
87 |
![loss curves](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/jul_24_23_1_13_00_log_loss_curves_llama-2-7b-dolphin.png)
|
88 |
|
89 |
The above loss curve was generated from the run's private wandb.ai log.
|
@@ -102,131 +80,18 @@ This model can produce factually incorrect output, and should not be relied on t
|
|
102 |
This model was trained on various public datasets.
|
103 |
While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
|
104 |
|
105 |
-
## How to Use
|
106 |
-
|
107 |
-
Basic usage: [notebook](assets/basic_inference_llama_2_7b_dolphin.ipynb)
|
108 |
-
|
109 |
-
Install and import the package dependencies:
|
110 |
-
|
111 |
-
```python
|
112 |
-
!pip install -q -U huggingface_hub peft transformers torch accelerate
|
113 |
-
```
|
114 |
-
|
115 |
-
```python
|
116 |
-
import torch
|
117 |
-
from peft import PeftModel, PeftConfig
|
118 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
119 |
-
```
|
120 |
-
|
121 |
-
Sign into a HF account with access to Llama-2:
|
122 |
-
|
123 |
-
```python
|
124 |
-
from huggingface_hub import notebook_login
|
125 |
-
notebook_login()
|
126 |
-
```
|
127 |
|
128 |
-
|
129 |
-
|
130 |
-
```python
|
131 |
-
peft_model_id = "dfurman/llama-2-7b-dolphin-peft"
|
132 |
-
config = PeftConfig.from_pretrained(peft_model_id)
|
133 |
-
|
134 |
-
tokenizer = AutoTokenizer.from_pretrained(
|
135 |
-
config.base_model_name_or_path,
|
136 |
-
use_auth_token=True
|
137 |
-
)
|
138 |
-
tokenizer.pad_token = tokenizer.eos_token
|
139 |
-
model = AutoModelForCausalLM.from_pretrained(
|
140 |
-
config.base_model_name_or_path,
|
141 |
-
torch_dtype=torch.bfloat16,
|
142 |
-
device_map="auto",
|
143 |
-
use_auth_token=True,
|
144 |
-
)
|
145 |
-
|
146 |
-
# Load the Lora model
|
147 |
-
model = PeftModel.from_pretrained(model, peft_model_id)
|
148 |
-
```
|
149 |
-
|
150 |
-
Once loaded, the model and tokenizer can be used with the following code:
|
151 |
-
|
152 |
-
```python
|
153 |
-
def llama_generate(
|
154 |
-
model: AutoModelForCausalLM,
|
155 |
-
tokenizer: AutoTokenizer,
|
156 |
-
prompt: str,
|
157 |
-
max_new_tokens: int = 128,
|
158 |
-
temperature: float = 0.92,
|
159 |
-
) -> str:
|
160 |
-
"""
|
161 |
-
Initialize the pipeline
|
162 |
-
Uses Hugging Face GenerationConfig defaults
|
163 |
-
https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig
|
164 |
-
Args:
|
165 |
-
model (transformers.AutoModelForCausalLM): Falcon model for text generation
|
166 |
-
tokenizer (transformers.AutoTokenizer): Tokenizer for model
|
167 |
-
prompt (str): Prompt for text generation
|
168 |
-
max_new_tokens (int, optional): Max new tokens after the prompt to generate. Defaults to 128.
|
169 |
-
temperature (float, optional): The value used to modulate the next token probabilities.
|
170 |
-
Defaults to 1.0
|
171 |
-
"""
|
172 |
-
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
173 |
-
|
174 |
-
inputs = tokenizer(
|
175 |
-
[prompt],
|
176 |
-
return_tensors="pt",
|
177 |
-
return_token_type_ids=False,
|
178 |
-
).to(
|
179 |
-
device
|
180 |
-
) # tokenize inputs, load on device
|
181 |
-
|
182 |
-
# when running Torch modules in lower precision, it is best practice to use the torch.autocast context manager.
|
183 |
-
with torch.autocast("cuda", dtype=torch.bfloat16):
|
184 |
-
response = model.generate(
|
185 |
-
**inputs,
|
186 |
-
max_new_tokens=max_new_tokens,
|
187 |
-
temperature=temperature,
|
188 |
-
return_dict_in_generate=True,
|
189 |
-
eos_token_id=tokenizer.eos_token_id,
|
190 |
-
pad_token_id=tokenizer.pad_token_id,
|
191 |
-
)
|
192 |
-
|
193 |
-
decoded_output = tokenizer.decode(
|
194 |
-
response["sequences"][0],
|
195 |
-
skip_special_tokens=True,
|
196 |
-
) # grab output in natural language
|
197 |
-
|
198 |
-
return decoded_output[len(prompt) :] # remove prompt from output
|
199 |
-
```
|
200 |
-
|
201 |
-
We can now generate text! For example:
|
202 |
-
|
203 |
-
```python
|
204 |
-
prompt = "### Human: Write me a numbered list of things to do in New York City.### Assistant: "
|
205 |
-
|
206 |
-
response = llama_generate(
|
207 |
-
model,
|
208 |
-
tokenizer,
|
209 |
-
prompt,
|
210 |
-
max_new_tokens=250,
|
211 |
-
temperature=0.92,
|
212 |
-
)
|
213 |
|
214 |
-
|
215 |
-
```
|
216 |
|
217 |
### Runtime tests
|
218 |
|
219 |
-
|
220 |
-
| runtime / 50 tokens (sec) | GPU | attn | torch dtype | VRAM (GB) |
|
221 |
-
|:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
|
222 |
-
| 2.93 | 1x A100 (40 GB SXM) | torch | bfloat16 | 25 |
|
223 |
-
| 3.24 | 1x A6000 (48 GB) | torch | bfloat16 | 25 |
|
224 |
-
|
225 |
-
The above runtime stats were generated from this [notebook](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/llama-2/dolphin/postprocessing-llama-2-7b-dolphin-peft.ipynb).
|
226 |
|
227 |
## Acknowledgements
|
228 |
|
229 |
-
This model was finetuned by Daniel Furman on
|
230 |
|
231 |
## Disclaimer
|
232 |
|
|
|
30 |
|
31 |
**llama-2-7b-dolphin-peft**:
|
32 |
|
33 |
+
coming
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
<br>
|
36 |
|
|
|
41 |
|
42 |
**llama-2-7b-dolphin-peft**:
|
43 |
|
44 |
+
coming
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
<br>
|
47 |
|
|
|
62 |
|
63 |
## Finetuning Description
|
64 |
|
|
|
|
|
65 |
![loss curves](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/jul_24_23_1_13_00_log_loss_curves_llama-2-7b-dolphin.png)
|
66 |
|
67 |
The above loss curve was generated from the run's private wandb.ai log.
|
|
|
80 |
This model was trained on various public datasets.
|
81 |
While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
|
82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
|
84 |
+
## How to Use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
|
86 |
+
coming
|
|
|
87 |
|
88 |
### Runtime tests
|
89 |
|
90 |
+
coming
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
## Acknowledgements
|
93 |
|
94 |
+
This model was finetuned by Daniel Furman on Sep 10, 2023 and is intended primarily for research purposes.
|
95 |
|
96 |
## Disclaimer
|
97 |
|