zera09
/

dpo_t5_2

@@ -18,15 +18,20 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [zera09/long_t5_4](https://huggingface.co/zera09/long_t5_4) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1334
-- Rewards/chosen: -1.1874
-- Rewards/rejected: -7.1548
-- Rewards/accuracies: 0.8925
-- Rewards/margins: 5.9674
-- Logps/rejected: -17.1905
-- Logps/chosen: -29.3439
-- Logits/rejected: -16.5218
-- Logits/chosen: -16.4181
 ## Model description
@@ -45,24 +50,15 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 32
-- eval_batch_size: 16
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
-- num_epochs: 5
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
-|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.8121        | 1.0   | 200  | 0.1588          | -0.2876        | -4.8892          | 0.8900             | 4.6017          | -13.4145       | -27.8442     | -16.8889        | -16.7169      |
-| 0.8121        | 2.0   | 400  | 0.1411          | -0.8025        | -6.2684          | 0.8956             | 5.4660          | -15.7131       | -28.7023     | -16.6016        | -16.4757      |
-| 0.202         | 3.0   | 600  | 0.1357          | -1.0857        | -6.8817          | 0.8919             | 5.7960          | -16.7353       | -29.1744     | -16.5371        | -16.4257      |
-| 0.202         | 4.0   | 800  | 0.1337          | -1.1644        | -7.1028          | 0.8925             | 5.9384          | -17.1038       | -29.3055     | -16.5297        | -16.4245      |
-| 0.1133        | 5.0   | 1000 | 0.1334          | -1.1874        | -7.1548          | 0.8925             | 5.9674          | -17.1905       | -29.3439     | -16.5218        | -16.4181      |
 ### Framework versions

 This model is a fine-tuned version of [zera09/long_t5_4](https://huggingface.co/zera09/long_t5_4) on the None dataset.
 It achieves the following results on the evaluation set:
+- eval_loss: 0.3486
+- eval_runtime: 189.2688
+- eval_samples_per_second: 8.454
+- eval_steps_per_second: 1.057
+- eval_rewards/chosen: 0.4122
+- eval_rewards/rejected: -0.7810
+- eval_rewards/accuracies: 0.8781
+- eval_rewards/margins: 1.1932
+- eval_logps/rejected: -6.7047
+- eval_logps/chosen: -26.4297
+- eval_logits/rejected: -19.0942
+- eval_logits/chosen: -18.7712
+- epoch: 2.3
+- step: 460
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-07
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
+- training_steps: 1000
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7eddc8ef9b4fb6bdadcbdd34c623957fa2e82482a4c6d6e892d5cc2507daba36
 size 1187780840

 version https://git-lfs.github.com/spec/v1
+oid sha256:2eb09400cb88e3ec576886b7cff51b1b77ced1a4c66ce7540ab6b58045489a9d
 size 1187780840

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b8d00d07f2ec02de1b20c0cd5722240bf550cc167dc0b12926ec2631e0d666cc
 size 5944

 version https://git-lfs.github.com/spec/v1
+oid sha256:368c0e8b8dbd049ffe8ae82a1cdafd47ede42a5ecc7605df19fe34f003ae4edd
 size 5944