simonycl commited on
Commit
bd9a56a
·
verified ·
1 Parent(s): 821b328

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,11 +1,12 @@
1
  ---
 
2
  license: llama3
3
  base_model: meta-llama/Meta-Llama-3-8B-Instruct
4
  tags:
5
  - alignment-handbook
6
  - generated_from_trainer
7
  datasets:
8
- - simonycl/ultrafeedback_annotate_single_judge
9
  model-index:
10
  - name: llama-3-8b-instruct-single-judge
11
  results: []
@@ -16,17 +17,17 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # llama-3-8b-instruct-single-judge
18
 
19
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the simonycl/ultrafeedback_annotate_single_judge dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.6579
22
- - Rewards/chosen: -0.9427
23
- - Rewards/rejected: -1.1235
24
- - Rewards/accuracies: 0.5813
25
- - Rewards/margins: 0.1808
26
- - Logps/rejected: -257.0670
27
- - Logps/chosen: -241.0294
28
- - Logits/rejected: -1.5009
29
- - Logits/chosen: -1.5001
30
 
31
  ## Model description
32
 
@@ -46,14 +47,14 @@ More information needed
46
 
47
  The following hyperparameters were used during training:
48
  - learning_rate: 5e-07
49
- - train_batch_size: 2
50
- - eval_batch_size: 4
51
  - seed: 42
52
  - distributed_type: multi-GPU
53
  - num_devices: 4
54
- - gradient_accumulation_steps: 16
55
  - total_train_batch_size: 128
56
- - total_eval_batch_size: 16
57
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
58
  - lr_scheduler_type: cosine
59
  - lr_scheduler_warmup_ratio: 0.1
@@ -63,12 +64,12 @@ The following hyperparameters were used during training:
63
 
64
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
65
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
66
- | 0.6077 | 0.8550 | 400 | 0.6579 | -0.9427 | -1.1235 | 0.5813 | 0.1808 | -257.0670 | -241.0294 | -1.5009 | -1.5001 |
67
 
68
 
69
  ### Framework versions
70
 
71
- - Transformers 4.44.0
72
  - Pytorch 2.4.0+cu121
73
  - Datasets 2.21.0
74
  - Tokenizers 0.19.1
 
1
  ---
2
+ library_name: transformers
3
  license: llama3
4
  base_model: meta-llama/Meta-Llama-3-8B-Instruct
5
  tags:
6
  - alignment-handbook
7
  - generated_from_trainer
8
  datasets:
9
+ - simonycl/Meta-Llama-3-8B-Instruct_ultrafeedback_single_judge
10
  model-index:
11
  - name: llama-3-8b-instruct-single-judge
12
  results: []
 
17
 
18
  # llama-3-8b-instruct-single-judge
19
 
20
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the simonycl/Meta-Llama-3-8B-Instruct_ultrafeedback_single_judge dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 0.6593
23
+ - Rewards/chosen: -1.3185
24
+ - Rewards/rejected: -1.5107
25
+ - Rewards/accuracies: 0.5935
26
+ - Rewards/margins: 0.1922
27
+ - Logps/rejected: -301.4192
28
+ - Logps/chosen: -283.3882
29
+ - Logits/rejected: -1.3226
30
+ - Logits/chosen: -1.3593
31
 
32
  ## Model description
33
 
 
47
 
48
  The following hyperparameters were used during training:
49
  - learning_rate: 5e-07
50
+ - train_batch_size: 1
51
+ - eval_batch_size: 2
52
  - seed: 42
53
  - distributed_type: multi-GPU
54
  - num_devices: 4
55
+ - gradient_accumulation_steps: 32
56
  - total_train_batch_size: 128
57
+ - total_eval_batch_size: 8
58
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
59
  - lr_scheduler_type: cosine
60
  - lr_scheduler_warmup_ratio: 0.1
 
64
 
65
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
66
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
67
+ | 0.5714 | 0.8528 | 400 | 0.6593 | -1.3185 | -1.5107 | 0.5935 | 0.1922 | -301.4192 | -283.3882 | -1.3226 | -1.3593 |
68
 
69
 
70
  ### Framework versions
71
 
72
+ - Transformers 4.44.2
73
  - Pytorch 2.4.0+cu121
74
  - Datasets 2.21.0
75
  - Tokenizers 0.19.1
all_results.json CHANGED
@@ -1,22 +1,22 @@
1
  {
2
- "epoch": 0.9982631930527722,
3
- "eval_logits/chosen": -1.5050362348556519,
4
- "eval_logits/rejected": -1.5059466361999512,
5
- "eval_logps/chosen": -240.73709106445312,
6
- "eval_logps/rejected": -256.97796630859375,
7
- "eval_loss": 0.6570853590965271,
8
- "eval_rewards/accuracies": 0.5813007950782776,
9
- "eval_rewards/chosen": -0.9397876858711243,
10
- "eval_rewards/margins": 0.18286575376987457,
11
- "eval_rewards/rejected": -1.1226533651351929,
12
- "eval_runtime": 149.0179,
13
- "eval_samples": 1961,
14
- "eval_samples_per_second": 13.159,
15
- "eval_steps_per_second": 0.825,
16
  "total_flos": 0.0,
17
- "train_loss": 0.6308592481327261,
18
- "train_runtime": 14534.736,
19
- "train_samples": 59875,
20
- "train_samples_per_second": 4.119,
21
- "train_steps_per_second": 0.032
22
  }
 
1
  {
2
+ "epoch": 0.9999333733093477,
3
+ "eval_logits/chosen": -1.3400534391403198,
4
+ "eval_logits/rejected": -1.3050051927566528,
5
+ "eval_logps/chosen": -281.1357727050781,
6
+ "eval_logps/rejected": -299.5000915527344,
7
+ "eval_loss": 0.6583243012428284,
8
+ "eval_rewards/accuracies": 0.5955284833908081,
9
+ "eval_rewards/chosen": -1.2959448099136353,
10
+ "eval_rewards/margins": 0.1955154538154602,
11
+ "eval_rewards/rejected": -1.4914603233337402,
12
+ "eval_runtime": 246.7004,
13
+ "eval_samples": 1962,
14
+ "eval_samples_per_second": 7.953,
15
+ "eval_steps_per_second": 0.997,
16
  "total_flos": 0.0,
17
+ "train_loss": 0.6122812041595801,
18
+ "train_runtime": 25569.0191,
19
+ "train_samples": 60035,
20
+ "train_samples_per_second": 2.348,
21
+ "train_steps_per_second": 0.018
22
  }
config.json CHANGED
@@ -23,7 +23,7 @@
23
  "rope_theta": 500000.0,
24
  "tie_word_embeddings": false,
25
  "torch_dtype": "bfloat16",
26
- "transformers_version": "4.44.0",
27
  "use_cache": true,
28
  "vocab_size": 128256
29
  }
 
23
  "rope_theta": 500000.0,
24
  "tie_word_embeddings": false,
25
  "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.44.2",
27
  "use_cache": true,
28
  "vocab_size": 128256
29
  }
eval_results.json CHANGED
@@ -1,16 +1,16 @@
1
  {
2
- "epoch": 0.9982631930527722,
3
- "eval_logits/chosen": -1.5050362348556519,
4
- "eval_logits/rejected": -1.5059466361999512,
5
- "eval_logps/chosen": -240.73709106445312,
6
- "eval_logps/rejected": -256.97796630859375,
7
- "eval_loss": 0.6570853590965271,
8
- "eval_rewards/accuracies": 0.5813007950782776,
9
- "eval_rewards/chosen": -0.9397876858711243,
10
- "eval_rewards/margins": 0.18286575376987457,
11
- "eval_rewards/rejected": -1.1226533651351929,
12
- "eval_runtime": 149.0179,
13
- "eval_samples": 1961,
14
- "eval_samples_per_second": 13.159,
15
- "eval_steps_per_second": 0.825
16
  }
 
1
  {
2
+ "epoch": 0.9999333733093477,
3
+ "eval_logits/chosen": -1.3400534391403198,
4
+ "eval_logits/rejected": -1.3050051927566528,
5
+ "eval_logps/chosen": -281.1357727050781,
6
+ "eval_logps/rejected": -299.5000915527344,
7
+ "eval_loss": 0.6583243012428284,
8
+ "eval_rewards/accuracies": 0.5955284833908081,
9
+ "eval_rewards/chosen": -1.2959448099136353,
10
+ "eval_rewards/margins": 0.1955154538154602,
11
+ "eval_rewards/rejected": -1.4914603233337402,
12
+ "eval_runtime": 246.7004,
13
+ "eval_samples": 1962,
14
+ "eval_samples_per_second": 7.953,
15
+ "eval_steps_per_second": 0.997
16
  }
generation_config.json CHANGED
@@ -8,5 +8,5 @@
8
  "max_length": 4096,
9
  "temperature": 0.6,
10
  "top_p": 0.9,
11
- "transformers_version": "4.44.0"
12
  }
 
8
  "max_length": 4096,
9
  "temperature": 0.6,
10
  "top_p": 0.9,
11
+ "transformers_version": "4.44.2"
12
  }
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:34c1370579373aa55ccd5e21dfa1f56196494be4b076a01f0037672f6561a7d0
3
  size 4976698672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6e4d90ca23309ff8eac136cabb590b316716750495a0c31e5350acb9d4ef75d
3
  size 4976698672
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:28eea8cc57121ca344c1a63ce2f61f60cd078135150688b9df5c9bf900867045
3
  size 4999802720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c74e3dd2d01e552f3bc27d13dd658c744b28ef135ddf094fda73eca9df516d24
3
  size 4999802720
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eefafc77324869f00b665dae887b5d8f55fa7bf6c732e05d339422a27cf8eb8a
3
  size 4915916176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:977482c84531661824fc55aeca30e8ec58c3d023d938872cecf3151167625922
3
  size 4915916176
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5465ffc08b896aa772497e498d4536661fa9f6c7971873d87f9e3168b46b634f
3
  size 1168138808
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:268bf2e3e02a5198884981188188fe3c144ba230d27dab75e41140618193b2e8
3
  size 1168138808
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 0.9982631930527722,
3
  "total_flos": 0.0,
4
- "train_loss": 0.6308592481327261,
5
- "train_runtime": 14534.736,
6
- "train_samples": 59875,
7
- "train_samples_per_second": 4.119,
8
- "train_steps_per_second": 0.032
9
  }
 
1
  {
2
+ "epoch": 0.9999333733093477,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.6122812041595801,
5
+ "train_runtime": 25569.0191,
6
+ "train_samples": 60035,
7
+ "train_samples_per_second": 2.348,
8
+ "train_steps_per_second": 0.018
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:863ad322a5f2b29e0af779336a3a89919d7e835e16f62fe3c599d3391e869051
3
  size 7544
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:789be51aeff227264050e7503678d5a40962ef4754d2aa42eefdb72c95b40441
3
  size 7544