SystemAdmin123 commited on
Commit
7542681
·
verified ·
1 Parent(s): f9a47e6

End of training

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -20,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
20
  axolotl version: `0.6.0`
21
  ```yaml
22
  base_model: trl-internal-testing/tiny-random-LlamaForCausalLM
23
- batch_size: 256
24
  bf16: true
25
  chat_template: tokenizer_default_fallback_alpaca
26
  datasets:
@@ -45,7 +45,7 @@ hub_strategy: checkpoint
45
  learning_rate: 0.0002
46
  logging_steps: 10
47
  lr_scheduler: cosine
48
- max_steps: 20000
49
  micro_batch_size: 32
50
  model_type: AutoModelForCausalLM
51
  num_epochs: 100
@@ -59,6 +59,8 @@ save_total_limit: 1
59
  sequence_len: 2048
60
  tokenizer_type: LlamaTokenizerFast
61
  torch_dtype: bf16
 
 
62
  trust_remote_code: true
63
  val_set_size: 0.1
64
  wandb_entity: ''
@@ -76,8 +78,6 @@ warmup_ratio: 0.05
76
  # test-repo
77
 
78
  This model is a fine-tuned version of [trl-internal-testing/tiny-random-LlamaForCausalLM](https://huggingface.co/trl-internal-testing/tiny-random-LlamaForCausalLM) on the argilla/databricks-dolly-15k-curated-en dataset.
79
- It achieves the following results on the evaluation set:
80
- - Loss: 9.6817
81
 
82
  ## Model description
83
 
@@ -101,19 +101,19 @@ The following hyperparameters were used during training:
101
  - eval_batch_size: 32
102
  - seed: 42
103
  - distributed_type: multi-GPU
104
- - num_devices: 8
105
- - total_train_batch_size: 256
106
- - total_eval_batch_size: 256
107
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
108
  - lr_scheduler_type: cosine
109
- - training_steps: 0
 
110
 
111
  ### Training results
112
 
113
- | Training Loss | Epoch | Step | Validation Loss |
114
- |:-------------:|:-------:|:----:|:---------------:|
115
- | No log | 0.3333 | 1 | 10.3756 |
116
- | 9.6746 | 66.6667 | 200 | 9.6817 |
117
 
118
 
119
  ### Framework versions
 
20
  axolotl version: `0.6.0`
21
  ```yaml
22
  base_model: trl-internal-testing/tiny-random-LlamaForCausalLM
23
+ batch_size: 128
24
  bf16: true
25
  chat_template: tokenizer_default_fallback_alpaca
26
  datasets:
 
45
  learning_rate: 0.0002
46
  logging_steps: 10
47
  lr_scheduler: cosine
48
+ max_steps: 10000
49
  micro_batch_size: 32
50
  model_type: AutoModelForCausalLM
51
  num_epochs: 100
 
59
  sequence_len: 2048
60
  tokenizer_type: LlamaTokenizerFast
61
  torch_dtype: bf16
62
+ training_args_kwargs:
63
+ hub_private_repo: true
64
  trust_remote_code: true
65
  val_set_size: 0.1
66
  wandb_entity: ''
 
78
  # test-repo
79
 
80
  This model is a fine-tuned version of [trl-internal-testing/tiny-random-LlamaForCausalLM](https://huggingface.co/trl-internal-testing/tiny-random-LlamaForCausalLM) on the argilla/databricks-dolly-15k-curated-en dataset.
 
 
81
 
82
  ## Model description
83
 
 
101
  - eval_batch_size: 32
102
  - seed: 42
103
  - distributed_type: multi-GPU
104
+ - num_devices: 4
105
+ - total_train_batch_size: 128
106
+ - total_eval_batch_size: 128
107
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
108
  - lr_scheduler_type: cosine
109
+ - lr_scheduler_warmup_steps: 5
110
+ - training_steps: 100
111
 
112
  ### Training results
113
 
114
+ | Training Loss | Epoch | Step | Validation Loss |
115
+ |:-------------:|:------:|:----:|:---------------:|
116
+ | No log | 0.1667 | 1 | 10.3764 |
 
117
 
118
 
119
  ### Framework versions