Update README.md
Browse files
README.md
CHANGED
@@ -225,7 +225,7 @@ print(tokenizer.decode(response, skip_special_tokens=True))
|
|
225 |
## Training Details
|
226 |
|
227 |
### Supervised fine-tuning
|
228 |
-
SFT on top of Qwen2.5-
|
229 |
|
230 |
We used Deepspeed's Zero-3 distributed training using the following hardware:
|
231 |
|
@@ -276,7 +276,7 @@ The training set consists of around 1.8B tokens, having 3 different types of dat
|
|
276 |
- Gradient accumulation steps: 4
|
277 |
|
278 |
### Model Merging
|
279 |
-
The model trained was merged with the Qwen2.5-
|
280 |
|
281 |
### Model Alignment
|
282 |
The model is aligned using the Direct Preference Optimization (DPO) technique through a two-step process:
|
|
|
225 |
## Training Details
|
226 |
|
227 |
### Supervised fine-tuning
|
228 |
+
SFT on top of Qwen2.5-72B using axolotl (https://github.com/axolotl-ai-cloud/axolotl).
|
229 |
|
230 |
We used Deepspeed's Zero-3 distributed training using the following hardware:
|
231 |
|
|
|
276 |
- Gradient accumulation steps: 4
|
277 |
|
278 |
### Model Merging
|
279 |
+
The model trained was merged with the Qwen2.5-72-Instruct model using the DARE_TIES technique. [Mergekit](https://github.com/arcee-ai/mergekit) was used to conduct the merging.
|
280 |
|
281 |
### Model Alignment
|
282 |
The model is aligned using the Direct Preference Optimization (DPO) technique through a two-step process:
|