File size: 11,430 Bytes
b5f323b f6a2448 2660242 b5f323b 2660242 b5f323b 718aa9b b5f323b 718aa9b 6aa01b9 718aa9b 68b3370 09879c8 4fe6e8d 68b3370 4fe6e8d da5637d b5f323b 718aa9b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- sft
- code
- lora
- peft
base_model: unsloth/tinyllama-chat-bnb-4bit
pipeline_tag: text-generation
datasets: Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl
---
# Uploaded model
- **Developed by:** Ramikan-BR
- **Model type:** [text-generation/Python Coder]
- **Language(s) (NLP):** [en]
- **License:** apache-2.0
- **Finetuned from model :** unsloth/tinyllama-chat-bnb-4bit
### Model Description
<!-- Provide a longer summary of what this model is. -->
### Training Data
datasets: [Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl](https://huggingface.co/datasets/Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl)
### Training Procedure
The model was refined using [Unsloath](https://github.com/unslothai/unsloth). The dataset [ise-uiuc/Magicoder-OSS-Instruct-75K](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K/blob/main/data-oss_instruct-decontaminated.jsonl) was adjusted, leaving only data on python and divided into 10 parts, each refinement occurred for 2 epochs, using adafactor optimizer or adamw_8bit (adafactor seems to deliver less loss).
### Model Sources [optional]
base_model: [unsloth/tinyllama-chat-bnb-4bit](https://huggingface.co/unsloth/tinyllama-chat-bnb-4bit)
model: [Ramikan-BR/tinyllama-coder-py-4bit-v10](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10)
gguf_f16: [tinyllama-coder-py-4bit-v10-unsloth.F16.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.F16.gguf)
gguf_Q4_K_M: [tinyllama-coder-py-4bit-v10-unsloth.Q4_K_M.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.Q4_K_M.gguf)
gguf_Q8_0: [tinyllama-coder-py-4bit-v10-unsloth.Q8_0.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.Q8_0.gguf)
#### Training Hyperparameters
Notebook [Unsloath](https://github.com/unslothai/unsloth) that I used for AI refinement: [TinyLlama](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
```python
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes # xformers "xformers<0.0.26"
import os
from google.colab import drive
drive.mount('/content/drive')
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/mistral-7b-bnb-4bit",
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
"unsloth/llama-2-7b-bnb-4bit",
"unsloth/llama-2-13b-bnb-4bit",
"unsloth/codellama-34b-bnb-4bit",
"unsloth/tinyllama-bnb-4bit",
"unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
"unsloth/gemma-2b-bnb-4bit",
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Ramikan-BR/tinyllama-coder-py-4bit_LORA-v9", # "unsloth/tinyllama" for 16bit loading
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastLanguageModel.get_peft_model(
model,
r = 256, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 512,
lora_dropout = 0, # Currently only supports dropout = 0
bias = "none", # Currently only supports bias = "none"
use_gradient_checkpointing = True, # @@@ IF YOU GET OUT OF MEMORY - set to True @@@
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Input:
{}
### Output:
{}"""
EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
inputs = examples["problem"]
outputs = examples["solution"]
texts = []
for input, output in zip(inputs, outputs):
# Must add EOS_TOKEN, otherwise your generation will go on forever!
text = alpaca_prompt.format(input, output) + EOS_TOKEN
texts.append(text)
return { "text" : texts}
pass
from datasets import load_dataset
dataset = load_dataset('json', data_files='/content/drive/MyDrive/data-oss_instruct-py-10.jsonl', split='train')
dataset = dataset.map(formatting_prompts_func, batched=True)
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from transformers.utils import logging
logging.set_verbosity_info()
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = True, # Packs short sequences together to save time!
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 256,
warmup_ratio = 0.1,
num_train_epochs = 2,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adafactor", # adamw_torch ou adamw_torch_fused +10% velocidade ou adafactor ou adamw_8bit
weight_decay = 0.1,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
trainer_stats = trainer.train()
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
model.push_to_hub("Ramikan-BR/tinyllama-coder-py-4bit_LORA-v10", token = "hf_...") # Online saving
tokenizer.push_to_hub("Ramikan-BR/tinyllama-coder-py-4bit_LORA-v10", token = "hf_...") # Online saving
# Merge to 16bit
model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "merged_16bit", token = "hf_...")
# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "merged_4bit", token = "hf_...")
# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "lora", token = "hf_...")
# Save to 8bit Q8_0
model.save_pretrained_gguf("model", tokenizer,)
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, token = "hf_...")
# Save to 16bit GGUF
model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, quantization_method = "f16", token = "hf_...")
# Save to q4_k_m GGUF
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, quantization_method = "q4_k_m", token = "hf_...")
Loss for 5 epochs in the last training session of the last part of the dataset:
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\\ /| Num examples = 407 | Num Epochs = 5
O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 256
\ / Total batch size = 512 | Total steps = 5
"-____-" Number of trainable parameters = 201,850,880
[5/5 29:36, Epoch 3/5]
Step Training Loss
1 0.568000
2 0.145300
3 0.506100
4 0.331900
5 0.276100
Quick test 1 after training the last part of the dataset:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
AI Response: ['<s> Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Input:\nContinue the fibonnaci sequence.\n\n### Output:\n1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 420, 787, 1444, 2881, 4765, 8640']
Quick test 2 after training the last part of the dataset:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
AI Response: <s> Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Input:
Continue the fibonnaci sequence.
### Output:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 420, 787, 1444, 2881, 4765, 8640, 17281, 31362, 65325, 128672, 251345, 410000, 720000, 1280000,
Quick test 3 after training the last part of the dataset:
if False:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
# alpaca_prompt = You MUST copy from above!
inputs = tokenizer(
[
alpaca_prompt.format(
"What is a famous tall tower in Paris?", # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
AI Response: <s> Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Input:
What is a famous tall tower in Paris?
### Output:
The famous tall tower in Paris is the Eiffel Tower. It is a 300-meter-tall steel tower located in the heart of Paris, France. The tower was built in 18892 and is a popular tourist attraction. It is also a symbol of the city
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)
```
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|