ptrdvn commited on
Commit
4e0a150
Β·
verified Β·
1 Parent(s): 3a84a25

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +203 -0
README.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+ # Suzume ORPO
5
+
6
+ <p align="center">
7
+ <img width=500 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/kWQSu02YfgYdUQqv4s5lq.png" alt="Suzume with Mitsu - a Japanese tree sparrow with honey on it"/>
8
+ </p>
9
+
10
+ [[Paper]](https://arxiv.org/abs/2405.18952) [[Dataset]](https://huggingface.co/datasets/lightblue/mitsu)
11
+
12
+ This is Suzume ORPO, an ORPO trained fine-tune of the [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) model using our [lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu) dataset.
13
+
14
+ We have trained several versions of this model using ORPO and so recommend that you use the best performing model from our tests, [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half).
15
+
16
+ Note that this model has a non-commerical license as we used the Command R and Command R+ models to generate our training data for this model ([lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu)).
17
+
18
+ We are currently working on a developing a commerically usable model, so stay tuned for that!
19
+
20
+ # Model list
21
+
22
+ We have ORPO trained the following models using different proportions of the [lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu) dataset:
23
+ * Trained on the top/bottom responses of all prompts in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-full](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-full)
24
+ * Trained on the top/bottom responses of the prompts of the 75\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top75](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top75)
25
+ * Trained on the top/bottom responses of the prompts of the 50\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half)
26
+ * Trained on the top/bottom responses of the prompts of the 25\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top25](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top25)
27
+
28
+ # Model results
29
+
30
+ We compare the MT-Bench scores across 6 languages for our 4 ORPO trained models, as well as some baselines:
31
+
32
+ * [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) - The foundation model that our models are ultimately built upon
33
+ * [Nexusflow/Starling-LM-7B-beta](https://huggingface.co/Nexusflow/Starling-LM-7B-beta) - The highest performing open model on the Chatbot arena that is of a similar size to ours
34
+ * gpt-3.5-turbo - A fairly high quality (although not state-of-the-art) proprietary LLM
35
+ * [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) - The base model which we train our ORPO finetunes from
36
+
37
+ | **MT-Bench language** | **meta-llama/Meta-Llama-3-8B-Instruct** | **Nexusflow/Starling-LM-7B-beta** | **gpt-3.5-turbo** | **lightblue/suzume-llama-3-8B-multilingual** | **lightblue/suzume-llama-3-8B-multilingual-orpo-borda-full** | **lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top75** | **lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half** | **lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top25** |
38
+ |-----------------------|-----------------------------------------|-----------------------------------|-------------------|----------------------------------------------|--------------------------------------------------------------|---------------------------------------------------------------|--------------------------------------------------------------|---------------------------------------------------------------|
39
+ | **Chinese πŸ‡¨πŸ‡³** | NaN | 6.97 | 7.55 | 7.11 | 7.65 | **7.77** | 7.74 | 7.44 |
40
+ | **English πŸ‡ΊπŸ‡Έ** | 7.98 | 7.92 | **8.26** | 7.73 | 7.98 | 7.94 | 7.98 | 8.22 |
41
+ | **French πŸ‡«πŸ‡·** | NaN | 7.29 | 7.74 | 7.66 | **7.84** | 7.46 | 7.78 | 7.81 |
42
+ | **German πŸ‡©πŸ‡ͺ** | NaN | 6.99 | 7.68 | 7.26 | 7.28 | 7.64 | 7.7 | **7.71** |
43
+ | **Japanese πŸ‡―πŸ‡΅** | NaN | 6.22 | **7.84** | 6.56 | 7.2 | 7.12 | 7.34 | 7.04 |
44
+ | **Russian πŸ‡·πŸ‡Ί** | NaN | 8.28 | 7.94 | 8.19 | 8.3 | 8.74 | **8.94** | 8.81 |
45
+
46
+ We can see noticable improvement on most languages compared to the base model. We also find that our ORPO models achieve the highest score out of all the models we evaluated for a number of languages.
47
+
48
+ # Training data
49
+
50
+ We trained this model using the [lightblue/mitsu_full_borda](https://huggingface.co/datasets/lightblue/mitsu_full_borda) dataset.
51
+
52
+ # Training configuration
53
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
54
+ should probably proofread and complete it, then remove this comment. -->
55
+
56
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
57
+ <details><summary>See axolotl config</summary>
58
+
59
+ axolotl version: `0.4.0`
60
+ ```yaml
61
+ base_model: lightblue/suzume-llama-3-8B-multilingual
62
+ model_type: LlamaForCausalLM
63
+ tokenizer_type: AutoTokenizer # PreTrainedTokenizerFast
64
+
65
+ load_in_8bit: false
66
+ load_in_4bit: false
67
+ strict: false
68
+
69
+ rl: orpo
70
+ orpo_alpha: 0.1
71
+ remove_unused_columns: false
72
+
73
+ chat_template: chatml
74
+ datasets:
75
+ - path: lightblue/mitsu_tophalf_borda
76
+ type: orpo.chat_template
77
+ conversation: llama-3
78
+ dataset_prepared_path: /workspace/llm_training/axolotl/llama3-multilingual-orpo/prepared_mitsu_half_borda
79
+ val_set_size: 0.02
80
+ output_dir: /workspace/llm_training/axolotl/llama3-multilingual-orpo/output_mitsu_half_borda
81
+
82
+ sequence_len: 8192
83
+ sample_packing: false
84
+ pad_to_sequence_len: true
85
+
86
+ use_wandb: true
87
+ wandb_project: axolotl
88
+ wandb_entity: peterd
89
+ wandb_name: mitsu_half_borda
90
+
91
+ gradient_accumulation_steps: 8
92
+ micro_batch_size: 1
93
+ num_epochs: 1
94
+ optimizer: paged_adamw_8bit
95
+ lr_scheduler: cosine
96
+ learning_rate: 8e-6
97
+
98
+ train_on_inputs: false
99
+ group_by_length: false
100
+ bf16: auto
101
+ fp16:
102
+ tf32: false
103
+
104
+ gradient_checkpointing: true
105
+ gradient_checkpointing_kwargs:
106
+ use_reentrant: false
107
+ early_stopping_patience:
108
+ resume_from_checkpoint:
109
+ logging_steps: 1
110
+ xformers_attention:
111
+ flash_attention: true
112
+
113
+ warmup_steps: 10
114
+ evals_per_epoch: 20
115
+ eval_table_size:
116
+ saves_per_epoch: 1
117
+ debug:
118
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
119
+ weight_decay: 0.0
120
+ special_tokens:
121
+ pad_token: <|end_of_text|>
122
+ ```
123
+
124
+ </details><br>
125
+
126
+ # workspace/llm_training/axolotl/llama3-multilingual-orpo/output_mitsu_half_borda
127
+
128
+ This model is a fine-tuned version of [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) on the None dataset.
129
+ It achieves the following results on the evaluation set:
130
+ - Loss: 0.0935
131
+
132
+ ## Model description
133
+
134
+ More information needed
135
+
136
+ ## Intended uses & limitations
137
+
138
+ More information needed
139
+
140
+ ## Training and evaluation data
141
+
142
+ More information needed
143
+
144
+ ## Training procedure
145
+
146
+ ### Training hyperparameters
147
+
148
+ The following hyperparameters were used during training:
149
+ - learning_rate: 8e-06
150
+ - train_batch_size: 1
151
+ - eval_batch_size: 1
152
+ - seed: 42
153
+ - distributed_type: multi-GPU
154
+ - num_devices: 4
155
+ - gradient_accumulation_steps: 8
156
+ - total_train_batch_size: 32
157
+ - total_eval_batch_size: 4
158
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
159
+ - lr_scheduler_type: cosine
160
+ - lr_scheduler_warmup_steps: 10
161
+ - num_epochs: 1
162
+
163
+ ### Training results
164
+
165
+ | Training Loss | Epoch | Step | Validation Loss |
166
+ |:-------------:|:-----:|:----:|:---------------:|
167
+ | 7.6299 | 0.02 | 1 | 7.7014 |
168
+ | 7.041 | 0.07 | 3 | 3.9786 |
169
+ | 0.6089 | 0.15 | 6 | 0.1393 |
170
+ | 0.1308 | 0.22 | 9 | 0.1244 |
171
+ | 0.1051 | 0.29 | 12 | 0.1112 |
172
+ | 0.1021 | 0.36 | 15 | 0.1063 |
173
+ | 0.0861 | 0.44 | 18 | 0.1026 |
174
+ | 0.1031 | 0.51 | 21 | 0.0979 |
175
+ | 0.0996 | 0.58 | 24 | 0.0967 |
176
+ | 0.0923 | 0.65 | 27 | 0.0960 |
177
+ | 0.1025 | 0.73 | 30 | 0.0944 |
178
+ | 0.1103 | 0.8 | 33 | 0.0939 |
179
+ | 0.0919 | 0.87 | 36 | 0.0937 |
180
+ | 0.104 | 0.94 | 39 | 0.0935 |
181
+
182
+
183
+ ### Framework versions
184
+
185
+ - Transformers 4.38.2
186
+ - Pytorch 2.2.1+cu121
187
+ - Datasets 2.18.0
188
+ - Tokenizers 0.15.0
189
+
190
+ # How to cite
191
+
192
+ ```tex
193
+ @article{devine2024sure,
194
+ title={Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets},
195
+ author={Devine, Peter},
196
+ journal={arXiv preprint arXiv:2405.18952},
197
+ year={2024}
198
+ }
199
+ ```
200
+
201
+ # Developer
202
+
203
+ Peter Devine - ([ptrdvn](https://huggingface.co/ptrdvn))