jtatman commited on
Commit
2959352
1 Parent(s): 353621e

End of training

Browse files
Files changed (4) hide show
  1. README.md +7 -4
  2. adapter_model.safetensors +2 -2
  3. config.json +1 -1
  4. pytorch_model.bin +1 -1
README.md CHANGED
@@ -61,7 +61,7 @@ output_dir: ./outputs/lora-alpaca-pythia-160m-storytelling
61
  gradient_accumulation_steps: 16
62
  micro_batch_size: 1
63
  num_epochs: 3
64
- learning_rate: 0.0006
65
  lr_scheduler: cosine_with_restarts
66
  #cosine_min_lr_ratio: 0.1
67
  train_on_inputs: false
@@ -75,7 +75,7 @@ xformers_attention: true
75
  optimizer: paged_adamw_8bit
76
  gpu_memory_limit: 8GiB
77
  hub_model_id: jtatman/pythia-160m-storytelling
78
- early_stopping_patience: 2
79
  #resume_from_checkpoint: outputs/lora-alpaca-pythia-125m/checkpoint-51040
80
  auto_resume_from_checkpoints: true
81
  local_rank:
@@ -98,7 +98,7 @@ tokens:
98
 
99
  This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on the None dataset.
100
  It achieves the following results on the evaluation set:
101
- - Loss: 5.0363
102
 
103
  ## Model description
104
 
@@ -117,7 +117,7 @@ More information needed
117
  ### Training hyperparameters
118
 
119
  The following hyperparameters were used during training:
120
- - learning_rate: 0.0006
121
  - train_batch_size: 1
122
  - eval_batch_size: 1
123
  - seed: 42
@@ -136,6 +136,9 @@ The following hyperparameters were used during training:
136
  | 4.2012 | 0.2348 | 200 | 4.1556 |
137
  | 4.4185 | 0.4696 | 400 | 4.8159 |
138
  | 5.0973 | 0.7043 | 600 | 5.0363 |
 
 
 
139
 
140
 
141
  ### Framework versions
 
61
  gradient_accumulation_steps: 16
62
  micro_batch_size: 1
63
  num_epochs: 3
64
+ learning_rate: 0.001
65
  lr_scheduler: cosine_with_restarts
66
  #cosine_min_lr_ratio: 0.1
67
  train_on_inputs: false
 
75
  optimizer: paged_adamw_8bit
76
  gpu_memory_limit: 8GiB
77
  hub_model_id: jtatman/pythia-160m-storytelling
78
+ early_stopping_patience: 3
79
  #resume_from_checkpoint: outputs/lora-alpaca-pythia-125m/checkpoint-51040
80
  auto_resume_from_checkpoints: true
81
  local_rank:
 
98
 
99
  This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on the None dataset.
100
  It achieves the following results on the evaluation set:
101
+ - Loss: 7.3539
102
 
103
  ## Model description
104
 
 
117
  ### Training hyperparameters
118
 
119
  The following hyperparameters were used during training:
120
+ - learning_rate: 0.001
121
  - train_batch_size: 1
122
  - eval_batch_size: 1
123
  - seed: 42
 
136
  | 4.2012 | 0.2348 | 200 | 4.1556 |
137
  | 4.4185 | 0.4696 | 400 | 4.8159 |
138
  | 5.0973 | 0.7043 | 600 | 5.0363 |
139
+ | 8.1159 | 0.9391 | 800 | 8.4966 |
140
+ | 6.7656 | 1.1739 | 1000 | 7.1575 |
141
+ | 7.0548 | 1.4087 | 1200 | 7.3539 |
142
 
143
 
144
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2e82a4ad5a2cd99d4cd86f4e5bd79962b304eb0c1833326168080d4dbdf35a29
3
- size 159266376
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e44ce263e6fd885f50d82ca515b9325375b43ee36ededb75acf161ce88bc2e41
3
+ size 48
config.json CHANGED
@@ -22,7 +22,7 @@
22
  "rotary_emb_base": 10000,
23
  "rotary_pct": 0.25,
24
  "tie_word_embeddings": false,
25
- "torch_dtype": "float16",
26
  "transformers_version": "4.41.2",
27
  "use_cache": false,
28
  "use_parallel_residual": true,
 
22
  "rotary_emb_base": 10000,
23
  "rotary_pct": 0.25,
24
  "tie_word_embeddings": false,
25
+ "torch_dtype": "bfloat16",
26
  "transformers_version": "4.41.2",
27
  "use_cache": false,
28
  "use_parallel_residual": true,
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:168d7d3a0afe4d5734fa7304faf1fe11e36d3bc39c475402248255244c303708
3
  size 324696090
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9dec837ce83e8a9edef8bb8c740c7c5826b1c3b830849b2c1c30a0e610b54bc6
3
  size 324696090