AlphaRandy commited on
Commit
9cc8433
·
verified ·
1 Parent(s): 84443a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -39
README.md CHANGED
@@ -1,58 +1,144 @@
1
  ---
2
- license: other
3
- library_name: transformers
4
- tags:
5
- - trl
6
- - sft
7
- - generated_from_trainer
8
- base_model: AlphaRandy/WhelanBot
9
- model-index:
10
- - name: WhelanBot
11
- results: []
12
- pipeline_tag: question-answering
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # gemma-chatbot
19
 
20
- This model is a fine-tuned version of [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) on the None dataset.
 
 
 
 
21
 
22
- ## Model description
 
 
 
23
 
24
- More information needed
 
 
 
 
 
 
25
 
26
- ## Intended uses & limitations
27
 
28
- More information needed
29
 
30
- ## Training and evaluation data
31
 
32
- More information needed
 
33
 
34
- ## Training procedure
 
35
 
36
- ### Training hyperparameters
37
 
38
- The following hyperparameters were used during training:
39
- - learning_rate: 0.0002
40
- - train_batch_size: 1
41
- - eval_batch_size: 8
42
- - seed: 42
43
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
- - lr_scheduler_type: cosine
45
- - training_steps: 250
46
- - mixed_precision_training: Native AMP
47
 
48
- ### Training results
49
 
 
 
 
50
 
 
51
 
52
- ### Framework versions
53
 
54
- - PEFT 0.10.0
55
- - Transformers 4.40.0.dev0
56
- - Pytorch 2.2.1+cu121
57
- - Datasets 2.18.0
58
- - Tokenizers 0.15.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ inference:
3
+ parameters:
4
+ temperature: 0.5
5
+ widget:
6
+ - messages:
7
+ - role: user
8
+ content: Hey Bud
 
 
 
 
9
  ---
10
 
11
+ ## Instruction format
 
12
 
13
+ This format must be strictly respected, otherwise the model will generate sub-optimal outputs.
14
 
15
+ The template used to build a prompt for the Instruct model is defined as follows:
16
+ ```
17
+ <s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
18
+ ```
19
+ Note that `<s>` and `</s>` are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.
20
 
21
+ As reference, here is the pseudo-code used to tokenize instructions during fine-tuning:
22
+ ```python
23
+ def tokenize(text):
24
+ return tok.encode(text, add_special_tokens=False)
25
 
26
+ [BOS_ID] +
27
+ tokenize("[INST]") + tokenize(USER_MESSAGE_1) + tokenize("[/INST]") +
28
+ tokenize(BOT_MESSAGE_1) + [EOS_ID] +
29
+
30
+ tokenize("[INST]") + tokenize(USER_MESSAGE_N) + tokenize("[/INST]") +
31
+ tokenize(BOT_MESSAGE_N) + [EOS_ID]
32
+ ```
33
 
34
+ In the pseudo-code above, note that the `tokenize` method should not add a BOS or EOS token automatically, but should add a prefix space.
35
 
36
+ In the Transformers library, one can use [chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) which make sure the right format is applied.
37
 
38
+ ## Run the model
39
 
40
+ ```python
41
+ from transformers import AutoModelForCausalLM, AutoTokenizer
42
 
43
+ model_id = "AlphaRandy/WhelanBot"
44
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
45
 
46
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
47
 
48
+ messages = [
49
+ {"role": "user", "content": "What is your favourite condiment?"},
50
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
51
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
52
+ ]
 
 
 
 
53
 
54
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
55
 
56
+ outputs = model.generate(inputs, max_new_tokens=20)
57
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
58
+ ```
59
 
60
+ By default, transformers will load the model in full precision. Therefore you might be interested to further reduce down the memory requirements to run the model through the optimizations we offer in HF ecosystem:
61
 
62
+ ### In half-precision
63
 
64
+ Note `float16` precision only works on GPU devices
65
+
66
+ <details>
67
+ <summary> Click to expand </summary>
68
+
69
+ ```diff
70
+ + import torch
71
+ from transformers import AutoModelForCausalLM, AutoTokenizer
72
+
73
+ model_id = "AlphaRandy/WhelanBot"
74
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
75
+
76
+ + model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
77
+
78
+ messages = [
79
+ {"role": "user", "content": "What is your favourite condiment?"},
80
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
81
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
82
+ ]
83
+
84
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
85
+
86
+ outputs = model.generate(input_ids, max_new_tokens=20)
87
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
88
+ ```
89
+ </details>
90
+
91
+ ### Lower precision using (8-bit & 4-bit) using `bitsandbytes`
92
+
93
+ <details>
94
+ <summary> Click to expand </summary>
95
+
96
+ ```diff
97
+ + import torch
98
+ from transformers import AutoModelForCausalLM, AutoTokenizer
99
+
100
+ model_id = "AlphaRandy/WhelanBot"
101
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
102
+
103
+ + model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto")
104
+
105
+ text = "Hello my name is"
106
+ messages = [
107
+ {"role": "user", "content": "What is your favourite condiment?"},
108
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
109
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
110
+ ]
111
+
112
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
113
+
114
+ outputs = model.generate(input_ids, max_new_tokens=20)
115
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
116
+ ```
117
+ </details>
118
+
119
+ ### Load the model with Flash Attention 2
120
+
121
+ <details>
122
+ <summary> Click to expand </summary>
123
+
124
+ ```diff
125
+ + import torch
126
+ from transformers import AutoModelForCausalLM, AutoTokenizer
127
+
128
+ model_id = "AlphaRandy/WhelanBot"
129
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
130
+
131
+ + model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True, device_map="auto")
132
+
133
+ messages = [
134
+ {"role": "user", "content": "What is your favourite condiment?"},
135
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
136
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
137
+ ]
138
+
139
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
140
+
141
+ outputs = model.generate(input_ids, max_new_tokens=20)
142
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
143
+ ```
144
+ </details>