File size: 3,271 Bytes
68f3f51 09cbc1d 68f3f51 afed1cc 68f3f51 09cbc1d 68f3f51 afed1cc 68f3f51 09cbc1d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
---
language:
- en
- zh
license: apache-2.0
tags:
- text-generation-inference
- transformers
- Chinese
- unsloth
- llama
- trl
base_model: waylandzhang/Llama-3-8b-Chinese-Novel-4bit-lesson-v0.1
---
# Uploaded model
- **Developed by:** waylandzhang
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
Teaching purpose model。 这个model只是配合我视频教学目的 :D
**QLoRA (4bit)**
Params to replicate training
Peft Config
```
r=8,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
lora_alpha=16,
lora_dropout=0,
bias="none",
random_state=3407,
use_rslora=False, # Rank stabilized LoRA
loftq_config=None, # LoftQ
```
Training args
```
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
gradient_accumulation_steps=4, # set to 4 to avoid issues with GPTQ Quantization
warmup_steps=5,
max_steps=300, # Fine-tune iterations
learning_rate=2e-4,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
evaluation_strategy="steps",
prediction_loss_only=True,
eval_accumulation_steps=1,
eval_steps=10,
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="cosine", # instead of "linear"
seed=1337,
output_dir="wayland-files/models",
report_to="wandb", # Log report to W&B
```
**Interernce Code**
```python
from unsloth import FastLanguageModel
import os
import torch
max_seq_length = 4096 # 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="waylandzhang/Llama-3-8b-Chinese-Novel-4bit-lesson-v0.1",
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
device_map="cuda",
attn_implementation="flash_attention_2"
)
FastLanguageModel.for_inference(model) # 使用unsloth的推理模式可以加快2倍速度
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
inputs = tokenizer(
[
alpaca_prompt.format(
"给你一段话,帮我继续写下去。", # 任务指令
"小明在西安城墙上", # 用户指令
"", # output - 留空以自动生成 / 不留空以填充
)
], return_tensors="pt").to("cuda")
# Opt 1: 文本生成输出
# outputs = model.generate(**inputs, max_new_tokens=500, use_cache=True)
# print(tokenizer.batch_decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True))
# Opt 2: 消息流式输出
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=500)
```
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |