LBusser/mistral_dpo0.0

Browse files

Files changed (8) hide show

README.md +30 -15
adapter_config.json +2 -2
adapter_model.safetensors +1 -1
runs/Dec12_09-39-28_b87680adc090/events.out.tfevents.1702373991.b87680adc090.694.0 +3 -0
runs/Dec12_09-46-01_b87680adc090/events.out.tfevents.1702374367.b87680adc090.3700.0 +3 -0
runs/Dec12_09-51-52_b87680adc090/events.out.tfevents.1702374715.b87680adc090.3700.1 +3 -0
runs/Dec12_09-56-11_b87680adc090/events.out.tfevents.1702374979.b87680adc090.6366.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -15,15 +15,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.3007
-- Rewards/chosen: 3.3485
-- Rewards/rejected: -0.5915
-- Rewards/accuracies: 0.5625
-- Rewards/margins: 3.9400
-- Logps/rejected: -240.6290
-- Logps/chosen: -290.0968
-- Logits/rejected: -2.6366
-- Logits/chosen: -2.8894
 ## Model description
@@ -49,18 +49,33 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
-- training_steps: 50
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.7115        | 0.01  | 10   | 0.8102          | 0.4909         | 0.3420           | 0.6875             | 0.1489          | -240.0067      | -292.0018    | -2.5876         | -2.8529       |
-| 0.8377        | 0.01  | 20   | 0.7520          | 1.3853         | 0.4623           | 0.5625             | 0.9230          | -239.9265      | -291.4055    | -2.5997         | -2.8644       |
-| 1.7703        | 0.01  | 30   | 1.2673          | 2.0943         | 0.4701           | 0.5                | 1.6242          | -239.9213      | -290.9329    | -2.6152         | -2.8803       |
-| 4.3734        | 0.02  | 40   | 1.1677          | 3.0584         | 0.0188           | 0.5625             | 3.0396          | -240.2222      | -290.2901    | -2.6272         | -2.8860       |
-| 6.8893        | 0.03  | 50   | 1.3007          | 3.3485         | -0.5915          | 0.5625             | 3.9400          | -240.6290      | -290.0968    | -2.6366         | -2.8894       |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6751
+- Rewards/chosen: 0.0215
+- Rewards/rejected: -0.0002
+- Rewards/accuracies: 0.4375
+- Rewards/margins: 0.0217
+- Logps/rejected: -132.4150
+- Logps/chosen: -333.1984
+- Logits/rejected: -2.7074
+- Logits/chosen: -2.3899
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
+- training_steps: 200
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.7508        | 0.01  | 10   | 0.7479          | -0.3566        | -0.2195          | 0.25               | -0.1371         | -351.7834      | -711.3196    | -1.6251         | -1.4448       |
+| 0.982         | 0.01  | 20   | 0.7765          | -0.5130        | -0.3405          | 0.25               | -0.1726         | -472.7075      | -867.7224    | -1.1628         | -1.0511       |
+| 0.6985        | 0.01  | 30   | 0.6899          | 0.0062         | 0.0027           | 0.375              | 0.0036          | -129.5716      | -348.4551    | -2.7357         | -2.3605       |
+| 0.6959        | 0.02  | 40   | 0.6935          | 0.0008         | 0.0022           | 0.25               | -0.0014         | -130.0675      | -353.8832    | -2.7275         | -2.3561       |
+| 0.6944        | 0.03  | 50   | 0.6892          | 0.0073         | 0.0040           | 0.4375             | 0.0033          | -128.2573      | -347.3910    | -2.7124         | -2.3589       |
+| 0.7785        | 0.03  | 60   | 0.7361          | -0.4130        | -0.2866          | 0.375              | -0.1264         | -418.8091      | -767.6629    | -1.3320         | -1.2310       |
+| 0.7009        | 0.04  | 70   | 0.7892          | -0.5637        | -0.3765          | 0.3125             | -0.1872         | -508.7737      | -918.3933    | -1.1171         | -1.0132       |
+| 0.7886        | 0.04  | 80   | 0.7862          | -0.5738        | -0.3892          | 0.3125             | -0.1845         | -521.4880      | -928.4485    | -1.1127         | -1.0064       |
+| 0.7059        | 0.04  | 90   | 0.7127          | -0.0370        | -0.0108          | 0.4375             | -0.0263         | -143.0086      | -391.7115    | -2.6542         | -2.3045       |
+| 0.6793        | 0.05  | 100  | 0.6981          | -0.0357        | -0.0284          | 0.375              | -0.0073         | -160.6859      | -390.4216    | -2.5199         | -2.2133       |
+| 0.7085        | 0.06  | 110  | 0.7039          | -0.0251        | -0.0089          | 0.3125             | -0.0162         | -141.1216      | -379.7617    | -2.6806         | -2.3312       |
+| 0.6959        | 0.06  | 120  | 0.6974          | -0.0162        | -0.0077          | 0.375              | -0.0085         | -139.9174      | -370.8595    | -2.6925         | -2.3406       |
+| 0.6897        | 0.07  | 130  | 0.6948          | -0.0122        | -0.0069          | 0.3125             | -0.0053         | -139.1202      | -366.9146    | -2.6971         | -2.3477       |
+| 0.6897        | 0.07  | 140  | 0.6935          | -0.0104        | -0.0067          | 0.3125             | -0.0038         | -138.8917      | -365.1371    | -2.6948         | -2.3576       |
+| 0.7015        | 0.07  | 150  | 0.6864          | 0.0011         | -0.0042          | 0.4375             | 0.0054          | -136.4684      | -353.5512    | -2.6973         | -2.3710       |
+| 0.6497        | 0.08  | 160  | 0.6814          | 0.0099         | -0.0023          | 0.4375             | 0.0122          | -134.5819      | -344.8182    | -2.7048         | -2.3806       |
+| 0.6893        | 0.09  | 170  | 0.6787          | 0.0147         | -0.0015          | 0.4375             | 0.0161          | -133.7108      | -340.0247    | -2.7106         | -2.3874       |
+| 0.7002        | 0.09  | 180  | 0.6776          | 0.0168         | -0.0010          | 0.4375             | 0.0178          | -133.2137      | -337.8709    | -2.7120         | -2.3888       |
+| 0.6875        | 0.1   | 190  | 0.6755          | 0.0209         | -0.0002          | 0.4375             | 0.0211          | -132.4327      | -333.8066    | -2.7093         | -2.3902       |
+| 0.6781        | 0.1   | 200  | 0.6751          | 0.0215         | -0.0002          | 0.4375             | 0.0217          | -132.4150      | -333.1984    | -2.7074         | -2.3899       |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -19,8 +19,8 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "q_proj",
-    "v_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "v_proj",
+    "q_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c18a3ef8cb66c0118d5e653070ac3e524057b58b21e039b86a488b08ebd18de6
 size 13648432

 version https://git-lfs.github.com/spec/v1
+oid sha256:aaeb5cc665e1f939133a39db9ab59cf2d621aa371cad06aca62202c2a67abf75
 size 13648432

runs/Dec12_09-39-28_b87680adc090/events.out.tfevents.1702373991.b87680adc090.694.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2cc2ceb4007c021601ee251c7137a1f6adb3db7d0d1a4defebefcf08f2196cae
+size 10896

runs/Dec12_09-46-01_b87680adc090/events.out.tfevents.1702374367.b87680adc090.3700.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1bf8fd98d884abca67fb3ebb53e7b6955d08d1ad709a54d365540dd901d66f5
+size 10896

runs/Dec12_09-51-52_b87680adc090/events.out.tfevents.1702374715.b87680adc090.3700.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9dc8eeb28ffb5e984274046cb4f394e6cab26b8e4255ce1c250c784363b1464
+size 6120

runs/Dec12_09-56-11_b87680adc090/events.out.tfevents.1702374979.b87680adc090.6366.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:73dcbd0a1601f245b3d60bade248cb7341b18bec22da1b95086f5bb890c170eb
+size 33043

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b7a1f3747b87802c2c225abe0faa47a5a588d700bc321029fd1d1c839f9e9d8c
 size 4155

 version https://git-lfs.github.com/spec/v1
+oid sha256:2644277bbc3e5cee9ff03eb870f562a3b1ecced88c2056d85586e3095b521b63
 size 4155