Update README.md
Browse files
README.md
CHANGED
@@ -12,22 +12,22 @@ tags:
|
|
12 |
base_model: EleutherAI/polyglot-ko-12.8b
|
13 |
---
|
14 |
|
15 |
-
This model is a instruct-tuned poylglot-ko-12.8b model, using 10% [Kullm, OIG, KoAlpaca] Instruction dataset.
|
16 |
|
17 |
## Training hyperparameters
|
18 |
- learning_rate: 5e-5
|
19 |
- seed: 42
|
20 |
-
- distributed_type: multi-GPU (A100
|
|
|
21 |
- train_batch_size: 4
|
22 |
-
-
|
23 |
-
- gradient_accumulation_steps: 4
|
24 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
25 |
- lr_scheduler_type: linear
|
26 |
- num_epochs: 2.0
|
27 |
|
28 |
## Framework versions
|
29 |
- Transformers 4.35.0
|
30 |
-
- Pytorch 2.0.1+
|
31 |
- Datasets 2.14.6
|
32 |
- deepspeed 0.11.1
|
33 |
- accelerate 0.24.1
|
|
|
12 |
base_model: EleutherAI/polyglot-ko-12.8b
|
13 |
---
|
14 |
|
15 |
+
This model is a instruct-tuned poylglot-ko-12.8b model, using 10% [Kullm, OIG, KoAlpaca] Instruction dataset. -> 29step
|
16 |
|
17 |
## Training hyperparameters
|
18 |
- learning_rate: 5e-5
|
19 |
- seed: 42
|
20 |
+
- distributed_type: multi-GPU (A100 40G) + CPU offloading (512GB)
|
21 |
+
- num_devices: 1
|
22 |
- train_batch_size: 4
|
23 |
+
- gradient_accumulation_steps: 16
|
|
|
24 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
25 |
- lr_scheduler_type: linear
|
26 |
- num_epochs: 2.0
|
27 |
|
28 |
## Framework versions
|
29 |
- Transformers 4.35.0
|
30 |
+
- Pytorch 2.0.1+cu117
|
31 |
- Datasets 2.14.6
|
32 |
- deepspeed 0.11.1
|
33 |
- accelerate 0.24.1
|