simonycl commited on
Commit
8c0375c
·
verified ·
1 Parent(s): 3f3498d

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,11 +1,12 @@
1
  ---
 
2
  license: llama3
3
  base_model: meta-llama/Meta-Llama-3-8B-Instruct
4
  tags:
5
  - alignment-handbook
6
  - generated_from_trainer
7
  datasets:
8
- - simonycl/llama3-ultrafeedback-annotate-judge-5
9
  model-index:
10
  - name: llama-3-8b-instruct-agg-judge
11
  results: []
@@ -16,17 +17,17 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # llama-3-8b-instruct-agg-judge
18
 
19
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the simonycl/llama3-ultrafeedback-annotate-judge-5 dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.6314
22
- - Rewards/chosen: -1.4785
23
- - Rewards/rejected: -1.8808
24
- - Rewards/accuracies: 0.6402
25
- - Rewards/margins: 0.4022
26
- - Logps/rejected: -332.7695
27
- - Logps/chosen: -294.6159
28
- - Logits/rejected: -1.5412
29
- - Logits/chosen: -1.5384
30
 
31
  ## Model description
32
 
@@ -46,14 +47,14 @@ More information needed
46
 
47
  The following hyperparameters were used during training:
48
  - learning_rate: 5e-07
49
- - train_batch_size: 2
50
- - eval_batch_size: 4
51
  - seed: 42
52
  - distributed_type: multi-GPU
53
  - num_devices: 4
54
- - gradient_accumulation_steps: 16
55
  - total_train_batch_size: 128
56
- - total_eval_batch_size: 16
57
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
58
  - lr_scheduler_type: cosine
59
  - lr_scheduler_warmup_ratio: 0.1
@@ -63,12 +64,12 @@ The following hyperparameters were used during training:
63
 
64
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
65
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
66
- | 0.5551 | 0.8550 | 400 | 0.6314 | -1.4785 | -1.8808 | 0.6402 | 0.4022 | -332.7695 | -294.6159 | -1.5412 | -1.5384 |
67
 
68
 
69
  ### Framework versions
70
 
71
- - Transformers 4.44.0
72
  - Pytorch 2.4.0+cu121
73
  - Datasets 2.21.0
74
  - Tokenizers 0.19.1
 
1
  ---
2
+ library_name: transformers
3
  license: llama3
4
  base_model: meta-llama/Meta-Llama-3-8B-Instruct
5
  tags:
6
  - alignment-handbook
7
  - generated_from_trainer
8
  datasets:
9
+ - simonycl/Llama-3-8B-Instruct-ultrafeedback-judge-5-annotate
10
  model-index:
11
  - name: llama-3-8b-instruct-agg-judge
12
  results: []
 
17
 
18
  # llama-3-8b-instruct-agg-judge
19
 
20
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the simonycl/Llama-3-8B-Instruct-ultrafeedback-judge-5-annotate dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 0.5049
23
+ - Rewards/chosen: -1.4456
24
+ - Rewards/rejected: -2.1220
25
+ - Rewards/accuracies: 0.7933
26
+ - Rewards/margins: 0.6764
27
+ - Logps/rejected: -356.5251
28
+ - Logps/chosen: -307.2105
29
+ - Logits/rejected: -1.1659
30
+ - Logits/chosen: -0.9705
31
 
32
  ## Model description
33
 
 
47
 
48
  The following hyperparameters were used during training:
49
  - learning_rate: 5e-07
50
+ - train_batch_size: 1
51
+ - eval_batch_size: 1
52
  - seed: 42
53
  - distributed_type: multi-GPU
54
  - num_devices: 4
55
+ - gradient_accumulation_steps: 32
56
  - total_train_batch_size: 128
57
+ - total_eval_batch_size: 4
58
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
59
  - lr_scheduler_type: cosine
60
  - lr_scheduler_warmup_ratio: 0.1
 
64
 
65
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
66
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
67
+ | 0.5505 | 0.8528 | 400 | 0.5049 | -1.4456 | -2.1220 | 0.7933 | 0.6764 | -356.5251 | -307.2105 | -1.1659 | -0.9705 |
68
 
69
 
70
  ### Framework versions
71
 
72
+ - Transformers 4.44.2
73
  - Pytorch 2.4.0+cu121
74
  - Datasets 2.21.0
75
  - Tokenizers 0.19.1
all_results.json CHANGED
@@ -1,22 +1,22 @@
1
  {
2
- "epoch": 0.9982631930527722,
3
- "eval_logits/chosen": -1.5078670978546143,
4
- "eval_logits/rejected": -1.5112224817276,
5
- "eval_logps/chosen": -295.1907958984375,
6
- "eval_logps/rejected": -333.72509765625,
7
- "eval_loss": 0.6313705444335938,
8
- "eval_rewards/accuracies": 0.6361788511276245,
9
- "eval_rewards/chosen": -1.484257698059082,
10
- "eval_rewards/margins": 0.40605515241622925,
11
- "eval_rewards/rejected": -1.890312671661377,
12
- "eval_runtime": 269.2761,
13
- "eval_samples": 1961,
14
- "eval_samples_per_second": 7.282,
15
- "eval_steps_per_second": 0.457,
16
  "total_flos": 0.0,
17
- "train_loss": 0.5881322287900544,
18
- "train_runtime": 16866.5046,
19
- "train_samples": 59875,
20
- "train_samples_per_second": 3.55,
21
- "train_steps_per_second": 0.028
22
  }
 
1
  {
2
+ "epoch": 0.9999333733093477,
3
+ "eval_logits/chosen": -0.9731917977333069,
4
+ "eval_logits/rejected": -1.166278600692749,
5
+ "eval_logps/chosen": -306.48046875,
6
+ "eval_logps/rejected": -356.2023010253906,
7
+ "eval_loss": 0.503829836845398,
8
+ "eval_rewards/accuracies": 0.7933906316757202,
9
+ "eval_rewards/chosen": -1.4382753372192383,
10
+ "eval_rewards/margins": 0.6804661750793457,
11
+ "eval_rewards/rejected": -2.118741512298584,
12
+ "eval_runtime": 11557.6821,
13
+ "eval_samples": 60035,
14
+ "eval_samples_per_second": 5.194,
15
+ "eval_steps_per_second": 1.299,
16
  "total_flos": 0.0,
17
+ "train_loss": 0.5891387982409138,
18
+ "train_runtime": 37343.5856,
19
+ "train_samples": 60035,
20
+ "train_samples_per_second": 1.608,
21
+ "train_steps_per_second": 0.013
22
  }
config.json CHANGED
@@ -23,7 +23,7 @@
23
  "rope_theta": 500000.0,
24
  "tie_word_embeddings": false,
25
  "torch_dtype": "bfloat16",
26
- "transformers_version": "4.44.0",
27
  "use_cache": true,
28
  "vocab_size": 128256
29
  }
 
23
  "rope_theta": 500000.0,
24
  "tie_word_embeddings": false,
25
  "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.44.2",
27
  "use_cache": true,
28
  "vocab_size": 128256
29
  }
eval_results.json CHANGED
@@ -1,16 +1,16 @@
1
  {
2
- "epoch": 0.9982631930527722,
3
- "eval_logits/chosen": -1.5078670978546143,
4
- "eval_logits/rejected": -1.5112224817276,
5
- "eval_logps/chosen": -295.1907958984375,
6
- "eval_logps/rejected": -333.72509765625,
7
- "eval_loss": 0.6313705444335938,
8
- "eval_rewards/accuracies": 0.6361788511276245,
9
- "eval_rewards/chosen": -1.484257698059082,
10
- "eval_rewards/margins": 0.40605515241622925,
11
- "eval_rewards/rejected": -1.890312671661377,
12
- "eval_runtime": 269.2761,
13
- "eval_samples": 1961,
14
- "eval_samples_per_second": 7.282,
15
- "eval_steps_per_second": 0.457
16
  }
 
1
  {
2
+ "epoch": 0.9999333733093477,
3
+ "eval_logits/chosen": -0.9731917977333069,
4
+ "eval_logits/rejected": -1.166278600692749,
5
+ "eval_logps/chosen": -306.48046875,
6
+ "eval_logps/rejected": -356.2023010253906,
7
+ "eval_loss": 0.503829836845398,
8
+ "eval_rewards/accuracies": 0.7933906316757202,
9
+ "eval_rewards/chosen": -1.4382753372192383,
10
+ "eval_rewards/margins": 0.6804661750793457,
11
+ "eval_rewards/rejected": -2.118741512298584,
12
+ "eval_runtime": 11557.6821,
13
+ "eval_samples": 60035,
14
+ "eval_samples_per_second": 5.194,
15
+ "eval_steps_per_second": 1.299
16
  }
generation_config.json CHANGED
@@ -8,5 +8,5 @@
8
  "max_length": 4096,
9
  "temperature": 0.6,
10
  "top_p": 0.9,
11
- "transformers_version": "4.44.0"
12
  }
 
8
  "max_length": 4096,
9
  "temperature": 0.6,
10
  "top_p": 0.9,
11
+ "transformers_version": "4.44.2"
12
  }
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d59998e606338c227db713796a769dd05db4458be1be180128e612a003ddcf5e
3
  size 4976698672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aadc0d089c01c51929020ecb50c2681dbe21c6798b2542a82bb3fb4a7391f490
3
  size 4976698672
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:69ab9ce83956487448a3c54fa5c473cb4c1699e99b1eec6529a448bea9082790
3
  size 4999802720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:799708995ad93f68aa9f73c9fdd0382bf183da7bd6496f320b063ee28ce7f144
3
  size 4999802720
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a282710b40678e35b23e4c01bd51d4997708c501941b4fdcaa015bdee2e823bc
3
  size 4915916176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:948d4df220f4c7139f9b504c8eec9bd66b959994c3cdf878bbff15e5dbbd62dd
3
  size 4915916176
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:df32c2bf8a67fdf19919b66b28ed3a6478d8bde44defa62902bee27fdb5efdad
3
  size 1168138808
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74cc414c0293f182445de8a80f8c3ddd96095a8d563049d72584d4a715e6831d
3
  size 1168138808
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 0.9982631930527722,
3
  "total_flos": 0.0,
4
- "train_loss": 0.5881322287900544,
5
- "train_runtime": 16866.5046,
6
- "train_samples": 59875,
7
- "train_samples_per_second": 3.55,
8
- "train_steps_per_second": 0.028
9
  }
 
1
  {
2
+ "epoch": 0.9999333733093477,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.5891387982409138,
5
+ "train_runtime": 37343.5856,
6
+ "train_samples": 60035,
7
+ "train_samples_per_second": 1.608,
8
+ "train_steps_per_second": 0.013
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fca683b1c8da8ae65b1c44ba6260b682c262d2b793b0a08fc2aaee2b7d0426bd
3
  size 7544
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a6681a55e194ed88e0b0738936a1587e12b39b5d17f641aed90732330e377db
3
  size 7544