File size: 3,271 Bytes
68f3f51
 
 
09cbc1d
68f3f51
 
 
 
afed1cc
68f3f51
 
 
09cbc1d
 
68f3f51
 
 
 
 
 
afed1cc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68f3f51
 
 
09cbc1d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
language:
- en
- zh
license: apache-2.0
tags:
- text-generation-inference
- transformers
- Chinese
- unsloth
- llama
- trl
base_model: waylandzhang/Llama-3-8b-Chinese-Novel-4bit-lesson-v0.1

---

# Uploaded  model

- **Developed by:** waylandzhang
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit

Teaching purpose model。 这个model只是配合我视频教学目的 :D


**QLoRA (4bit)**

Params to replicate training

Peft Config
```
    r=8,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    random_state=3407,
    use_rslora=False,  # Rank stabilized LoRA
    loftq_config=None,  # LoftQ
```


Training args
```
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,  # set to 4 to avoid issues with GPTQ Quantization
    warmup_steps=5,
    max_steps=300,  # Fine-tune iterations
    learning_rate=2e-4, 
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    evaluation_strategy="steps",
    prediction_loss_only=True,
    eval_accumulation_steps=1,
    eval_steps=10,
    logging_steps=1,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="cosine",  # instead of "linear"
    seed=1337,
    output_dir="wayland-files/models",
    report_to="wandb",  # Log report to W&B
```



**Interernce Code**


```python
from unsloth import FastLanguageModel
import os
import torch

max_seq_length = 4096  # 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="waylandzhang/Llama-3-8b-Chinese-Novel-4bit-lesson-v0.1",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    device_map="cuda",
    attn_implementation="flash_attention_2"
)

FastLanguageModel.for_inference(model)  # 使用unsloth的推理模式可以加快2倍速度

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
    [
        alpaca_prompt.format(
            "给你一段话,帮我继续写下去。",  # 任务指令
            "小明在西安城墙上",  # 用户指令
            "",  # output - 留空以自动生成 / 不留空以填充
        )
    ], return_tensors="pt").to("cuda")

# Opt 1: 文本生成输出
# outputs = model.generate(**inputs, max_new_tokens=500, use_cache=True)
# print(tokenizer.batch_decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True))

# Opt 2: 消息流式输出
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=500)


```




This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)