File size: 8,627 Bytes
b5f323b f6a2448 2660242 b5f323b 2660242 b5f323b 718aa9b b5f323b 718aa9b 6aa01b9 718aa9b 68b3370 09879c8 68b3370 718aa9b b5f323b 718aa9b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- sft
- code
- lora
- peft
base_model: unsloth/tinyllama-chat-bnb-4bit
pipeline_tag: text-generation
datasets: Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl
---
# Uploaded model
- **Developed by:** Ramikan-BR
- **Model type:** [text-generation/Python Coder]
- **Language(s) (NLP):** [en]
- **License:** apache-2.0
- **Finetuned from model :** unsloth/tinyllama-chat-bnb-4bit
### Model Description
<!-- Provide a longer summary of what this model is. -->
### Training Data
datasets: [Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl](https://huggingface.co/datasets/Ramikan-BR/data-oss_instruct-decontaminated_python.jsonl)
### Training Procedure
The model was refined using [Unsloath](https://github.com/unslothai/unsloth). The dataset [ise-uiuc/Magicoder-OSS-Instruct-75K](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K/blob/main/data-oss_instruct-decontaminated.jsonl) was adjusted, leaving only data on python and divided into 10 parts, each refinement occurred for 2 epochs, using adafactor optimizer or adamw_8bit (adafactor seems to deliver less loss).
### Model Sources [optional]
base_model: [unsloth/tinyllama-chat-bnb-4bit](https://huggingface.co/unsloth/tinyllama-chat-bnb-4bit)
model: [Ramikan-BR/tinyllama-coder-py-4bit-v10](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10)
gguf_f16: [tinyllama-coder-py-4bit-v10-unsloth.F16.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.F16.gguf)
gguf_Q4_K_M: [tinyllama-coder-py-4bit-v10-unsloth.Q4_K_M.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.Q4_K_M.gguf)
gguf_Q8_0: [tinyllama-coder-py-4bit-v10-unsloth.Q8_0.gguf](https://huggingface.co/Ramikan-BR/tinyllama-coder-py-4bit-v10/blob/main/tinyllama-coder-py-4bit-v10-unsloth.Q8_0.gguf)
#### Training Hyperparameters
Notebook [Unsloath](https://github.com/unslothai/unsloth) that I used for AI refinement: [TinyLlama](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
```python
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes # xformers "xformers<0.0.26"
import os
from google.colab import drive
drive.mount('/content/drive')
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/mistral-7b-bnb-4bit",
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
"unsloth/llama-2-7b-bnb-4bit",
"unsloth/llama-2-13b-bnb-4bit",
"unsloth/codellama-34b-bnb-4bit",
"unsloth/tinyllama-bnb-4bit",
"unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
"unsloth/gemma-2b-bnb-4bit",
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Ramikan-BR/tinyllama-coder-py-4bit_LORA-v9", # "unsloth/tinyllama" for 16bit loading
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastLanguageModel.get_peft_model(
model,
r = 256, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 512,
lora_dropout = 0, # Currently only supports dropout = 0
bias = "none", # Currently only supports bias = "none"
use_gradient_checkpointing = True, # @@@ IF YOU GET OUT OF MEMORY - set to True @@@
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Input:
{}
### Output:
{}"""
EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
inputs = examples["problem"]
outputs = examples["solution"]
texts = []
for input, output in zip(inputs, outputs):
# Must add EOS_TOKEN, otherwise your generation will go on forever!
text = alpaca_prompt.format(input, output) + EOS_TOKEN
texts.append(text)
return { "text" : texts}
pass
from datasets import load_dataset
dataset = load_dataset('json', data_files='/content/drive/MyDrive/data-oss_instruct-py-10.jsonl', split='train')
dataset = dataset.map(formatting_prompts_func, batched=True)
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from transformers.utils import logging
logging.set_verbosity_info()
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = True, # Packs short sequences together to save time!
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 256,
warmup_ratio = 0.1,
num_train_epochs = 2,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adafactor", # adamw_torch ou adamw_torch_fused +10% velocidade ou adafactor ou adamw_8bit
weight_decay = 0.1,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
trainer_stats = trainer.train()
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
model.push_to_hub("Ramikan-BR/tinyllama-coder-py-4bit_LORA-v10", token = "hf_...") # Online saving
tokenizer.push_to_hub("Ramikan-BR/tinyllama-coder-py-4bit_LORA-v10", token = "hf_...") # Online saving
# Merge to 16bit
model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "merged_16bit", token = "hf_...")
# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "merged_4bit", token = "hf_...")
# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, save_method = "lora", token = "hf_...")
# Save to 8bit Q8_0
model.save_pretrained_gguf("model", tokenizer,)
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, token = "hf_...")
# Save to 16bit GGUF
model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, quantization_method = "f16", token = "hf_...")
# Save to q4_k_m GGUF
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf("Ramikan-BR/tinyllama-coder-py-4bit-v10", tokenizer, quantization_method = "q4_k_m", token = "hf_...")
Loss for 5 epochs in the last training session of the last part of the dataset:
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\\ /| Num examples = 407 | Num Epochs = 5
O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 256
\ / Total batch size = 512 | Total steps = 5
"-____-" Number of trainable parameters = 201,850,880
[5/5 29:36, Epoch 3/5]
Step Training Loss
1 0.568000
2 0.145300
3 0.506100
4 0.331900
5 0.276100
Parameters:
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|