andreaskoepf
commited on
Commit
•
7d15677
1
Parent(s):
ac7f5bb
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,195 @@
|
|
1 |
---
|
2 |
license: llama2
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: llama2
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
datasets:
|
6 |
+
- OpenAssistant/oasst1
|
7 |
---
|
8 |
+
# Open-Assistant Llama2 70B SFT v10
|
9 |
+
|
10 |
+
This model is an Open-Assistant fine-tuning of Meta's [Llama2 70B](https://huggingface.co/meta-llama/Llama-2-70b) LLM.
|
11 |
+
|
12 |
+
|
13 |
+
## Model Details
|
14 |
+
|
15 |
+
- **Finetuned from:** [meta-llama/Llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b) via [epfLLM/old-Megatron-LM](https://github.com/epfLLM/old-Megatron-LM)
|
16 |
+
- **Model type:** Causal decoder-only transformer language model
|
17 |
+
- **Language:** English, German, Spanish, French (and limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish)
|
18 |
+
- **Weights & Biases:** [Stage 1](https://wandb.ai/open-assistant/public-sft/runs/run45_oasst_pre10_llama2_70b) (1 epoch pretrain-mix, 12k steps), [Stage 2](https://wandb.ai/open-assistant/public-sft/runs/run46_oasst_sft10_llama2_70b) (3 epochs oasst top-1, 519 steps)
|
19 |
+
- **Demo:** [Continuations for 250 random prompts (TGI, 4bit nf4 quantization)](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-08-22_OpenAssistant_llama2-70b-oasst-sft-v10_sampling_noprefix2_nf4.json%0A)
|
20 |
+
- **Evaluation** [FastEval-OpenAssistant Overview](https://tju01.github.io/FastEval-OpenAssistant/) (using [FastEval](https://github.com/FastEval/FastEval) & [vLLM](https://github.com/vllm-project/vllm))
|
21 |
+
- **License:** [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta-llama/Llama-2-70b/raw/main/LICENSE.txt)
|
22 |
+
- **Contact:** [Open-Assistant Discord](https://ykilcher.com/open-assistant-discord)
|
23 |
+
|
24 |
+
|
25 |
+
## Prompting / Prompt Template
|
26 |
+
|
27 |
+
The model was trained with OpenAI's [chatml](https://github.com/openai/openai-python/blob/main/chatml.md) prompt format:
|
28 |
+
"<|im_start|>system\n{system_message}<im_end>\n<|im_start|>user\n{user prompt}<|im_end|>\n<|im_start|>assistant\n{Assistant answer}<|im_end|>\n"
|
29 |
+
|
30 |
+
|
31 |
+
Multi-line:
|
32 |
+
|
33 |
+
```
|
34 |
+
<|im_start|>system
|
35 |
+
{system_message}<|im_end|>
|
36 |
+
<|im_start|>user
|
37 |
+
{user prompt}<|im_end|>
|
38 |
+
<|im_start|>assistant
|
39 |
+
{Assistant answer}<|im_end|>
|
40 |
+
```
|
41 |
+
|
42 |
+
The model was partly trained with orca system messages. For inference we can recommend the official [llama2 system prompt](https://github.com/facebookresearch/llama/blob/ea9f33d6d3ea8ed7d560d270986407fd6c2e52b7/example_chat_completion.py#L57-L61):
|
43 |
+
```
|
44 |
+
<|im_start|>system
|
45 |
+
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
|
46 |
+
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
|
47 |
+
<|im_end|>
|
48 |
+
```
|
49 |
+
|
50 |
+
## Configuration Details
|
51 |
+
|
52 |
+
### Stage 1 Pretokenizer Configuration
|
53 |
+
|
54 |
+
```
|
55 |
+
oasst_pre10_min25:
|
56 |
+
datasets:
|
57 |
+
- megacode2:
|
58 |
+
fraction: 0.5
|
59 |
+
val_split: 0.01
|
60 |
+
max_val_set: 1000
|
61 |
+
- orca-chat:
|
62 |
+
val_split: 0.01
|
63 |
+
max_val_set: 1000
|
64 |
+
- dolly15k_multilingual:
|
65 |
+
val_split: 0.05
|
66 |
+
max_val_set: 300
|
67 |
+
- oa_leet10k:
|
68 |
+
val_split: 0.05
|
69 |
+
max_val_set: 250
|
70 |
+
output_dir: "output/oasst_pre10_min25"
|
71 |
+
filename_prefix: "oasst_pre10"
|
72 |
+
min_assistant_tokens: 25
|
73 |
+
```
|
74 |
+
|
75 |
+
### Stage 2 Pretokenizer Configuration
|
76 |
+
|
77 |
+
```
|
78 |
+
oasst_top1:
|
79 |
+
datasets:
|
80 |
+
- oasst_export:
|
81 |
+
lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
|
82 |
+
input_file_path: 2023-07-23_oasst_ready.tar.gz
|
83 |
+
top_k: 1
|
84 |
+
val_split: 0.05
|
85 |
+
output_dir: "output/oasst_top1_2023-07-23"
|
86 |
+
filename_prefix: "oasst_top1"
|
87 |
+
```
|
88 |
+
|
89 |
+
### Megatron Fine-Tuning Arguments for Stage 1 (Instruction Tuning):
|
90 |
+
```
|
91 |
+
--tensor_model_parallel_size 8
|
92 |
+
--pipeline_model_parallel_size 4
|
93 |
+
--load ./akoepf/checkpoints/llama2-70b-tp8-pp4
|
94 |
+
--save ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_pre10
|
95 |
+
--tensorboard_dir ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_pre10/logging
|
96 |
+
--data_path ./akoepf/data/oasst_pre10_min25_llama2/oasst_sft10-train
|
97 |
+
--model_name llama2
|
98 |
+
--tokenizer_type SentencePieceTokenizer
|
99 |
+
--bf16
|
100 |
+
--global_batch_size 64
|
101 |
+
--micro_batch_size 2
|
102 |
+
--vocab_file=./akoepf/llama2/Llama-2-7b/tokenizer.model
|
103 |
+
--use_rms_norm
|
104 |
+
--glu_activation swiglu
|
105 |
+
--no_tie_embed_logits
|
106 |
+
--vocab_extra_ids_list "\"<|im_start|>,<|im_end|>\""
|
107 |
+
--layernorm_epsilon 1e-5
|
108 |
+
--use_flash_attn
|
109 |
+
--no_bias_gelu_fusion
|
110 |
+
--seq_length 4096
|
111 |
+
--max_position_embeddings 4096
|
112 |
+
--log_interval 1
|
113 |
+
--save_interval 500
|
114 |
+
--eval_interval 50
|
115 |
+
--eval_iters 10
|
116 |
+
--hidden_dropout 0.0
|
117 |
+
--position_embedding_type rotary
|
118 |
+
--no_bias_dropout_fusion
|
119 |
+
--use_checkpoint_args
|
120 |
+
--train_iters 12000
|
121 |
+
--attention_dropout 0.0
|
122 |
+
--adam_beta1 0.9
|
123 |
+
--adam_beta2 0.95
|
124 |
+
--adam_eps 1e-12
|
125 |
+
--lr_decay_style cosine
|
126 |
+
--lr_warmup_iters 100
|
127 |
+
--lr 1e-5
|
128 |
+
--min_lr 1e-6
|
129 |
+
--weight_decay 0.000001
|
130 |
+
--sequence_parallel
|
131 |
+
--recompute_granularity selective
|
132 |
+
--log_timers_to_tensorboard
|
133 |
+
--rope_scaling_factor 1.0
|
134 |
+
--wandb_logger
|
135 |
+
```
|
136 |
+
|
137 |
+
### Megatron Fine-Tuning Arguments for Stage 2 (OASST Polishing, LIMA Dropout):
|
138 |
+
```
|
139 |
+
--tensor_model_parallel_size 8
|
140 |
+
--pipeline_model_parallel_size 4
|
141 |
+
--load ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_pre10
|
142 |
+
--save ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_sft10
|
143 |
+
--tensorboard_dir ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_sft10/logging
|
144 |
+
--data_path ./akoepf/data/oasst_top1_2023-07-23_llama2/oasst_top1-train
|
145 |
+
--model_name llama2
|
146 |
+
--tokenizer_type SentencePieceTokenizer
|
147 |
+
--bf16
|
148 |
+
--global_batch_size 64
|
149 |
+
--micro_batch_size 2
|
150 |
+
--vocab_file=./akoepf/llama2/Llama-2-7b/tokenizer.model
|
151 |
+
--use_rms_norm
|
152 |
+
--glu_activation swiglu
|
153 |
+
--no_tie_embed_logits
|
154 |
+
--vocab_extra_ids_list "\"<|im_start|>,<|im_end|>\""
|
155 |
+
--layernorm_epsilon 1e-5
|
156 |
+
--use_flash_attn
|
157 |
+
--no_bias_gelu_fusion
|
158 |
+
--seq_length 4096
|
159 |
+
--max_position_embeddings 4096
|
160 |
+
--log_interval 1
|
161 |
+
--save_interval 346
|
162 |
+
--eval_interval 50
|
163 |
+
--eval_iters 10
|
164 |
+
--hidden_dropout 0.25
|
165 |
+
--lima_dropout
|
166 |
+
--position_embedding_type rotary
|
167 |
+
--no_bias_dropout_fusion
|
168 |
+
--use_checkpoint_args
|
169 |
+
--train_iters 519
|
170 |
+
--attention_dropout 0.0
|
171 |
+
--adam_beta1 0.9
|
172 |
+
--adam_beta2 0.95
|
173 |
+
--adam_eps 1e-12
|
174 |
+
--lr_decay_style cosine
|
175 |
+
--lr_warmup_iters 100
|
176 |
+
--lr 1e-5
|
177 |
+
--min_lr 1e-6
|
178 |
+
--weight_decay 0.000001
|
179 |
+
--sequence_parallel
|
180 |
+
--recompute_granularity selective
|
181 |
+
--log_timers_to_tensorboard
|
182 |
+
--rope_scaling_factor 1.0
|
183 |
+
--finetune
|
184 |
+
--wandb_logger
|
185 |
+
```
|
186 |
+
|
187 |
+
|
188 |
+
## Ethical Considerations and Limitations
|
189 |
+
|
190 |
+
Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios.
|
191 |
+
For these reasons, as with all LLMs, the potential outputs of llama2-70b-oasst-sft-v10 cannot be predicted
|
192 |
+
in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
|
193 |
+
to user prompts. Therefore, before deploying any applications of llama2-70b-oasst-sft-v10, developers should
|
194 |
+
perform safety testing and tuning tailored to their specific applications of the model.
|
195 |
+
|