mistral-7b-dpo-constitutional-ai

This model is a fine-tuned version of alignment-handbook/mistral-7b-sft-constitutional-ai on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/cai-conversation-harmless datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6730
  • Rewards/chosen: -13.2619
  • Rewards/rejected: -22.1436
  • Rewards/accuracies: 0.7075
  • Rewards/margins: 8.8817
  • Logps/rejected: -393.3515
  • Logps/chosen: -326.8571
  • Logits/rejected: -2.4037
  • Logits/chosen: -2.4315

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6896 0.02 100 -3.1677 -3.1771 -194.1546 -171.9473 0.6890 0.5475 0.0084 0.0115 -0.0031
0.6665 0.04 200 0.6632 0.0758 -0.0052 0.6450 0.0810 -171.9678 -193.4802 -3.1751 -3.1655
0.6381 0.06 300 0.6210 0.1621 -0.0714 0.6875 0.2335 -172.6297 -192.6168 -3.1717 -3.1615
0.5753 0.08 400 0.5865 0.0774 -0.3384 0.6975 0.4158 -175.2999 -193.4643 -3.1545 -3.1444
0.5092 0.1 500 0.5518 -0.0506 -0.6697 0.7100 0.6192 -178.6130 -194.7436 -3.1476 -3.1387
0.5374 0.12 600 0.5302 -0.3254 -1.1236 0.7150 0.7982 -183.1516 -197.4919 -3.1268 -3.1175
0.4719 0.14 700 0.5122 -0.5479 -1.5575 0.7225 1.0096 -187.4913 -199.7175 -3.1106 -3.1009
0.5036 0.16 800 0.5093 -0.3534 -1.5324 0.7075 1.1789 -187.2395 -197.7726 -3.1155 -3.1069
0.456 0.17 900 0.5018 -1.1302 -2.4591 0.7250 1.3289 -196.5069 -205.5396 -3.0940 -3.0859
0.574 0.19 1000 0.5006 -1.2309 -2.6549 0.7200 1.4240 -198.4646 -206.5471 -3.0788 -3.0726
0.5162 0.21 1100 0.5014 -1.8915 -3.4039 0.7125 1.5124 -205.9553 -213.1533 -3.0310 -3.0228
0.5772 0.23 1200 0.4930 -2.7962 -4.4210 0.7150 1.6247 -216.1257 -222.2005 -3.0464 -3.0409
0.5046 0.25 1300 0.4965 -2.0275 -3.8599 0.7075 1.8323 -210.5148 -214.5135 -2.9547 -2.9496
0.4987 0.27 1400 0.4858 -2.1834 -4.1148 0.7050 1.9313 -213.0636 -216.0722 -2.9468 -2.9388
0.4808 0.29 1500 0.4956 -2.2201 -4.2035 0.7225 1.9835 -213.9512 -216.4386 -2.8319 -2.8259
0.5445 0.31 1600 0.4917 -2.7200 -4.6427 0.7150 1.9227 -218.3425 -221.4376 -2.8427 -2.8415
0.5903 0.33 1700 0.5078 -2.4677 -4.5796 0.6850 2.1119 -217.7116 -218.9146 -2.9215 -2.9204
0.4285 0.35 1800 0.4977 -2.7944 -5.1756 0.6825 2.3811 -223.6717 -222.1824 -2.7299 -2.7308
0.5443 0.37 1900 0.4874 -3.1231 -5.5313 0.6950 2.4081 -227.2286 -225.4695 -2.8392 -2.8397
0.4776 0.39 2000 0.4851 -3.3957 -5.8725 0.7000 2.4767 -230.6406 -228.1953 -2.6600 -2.6676
0.5387 0.41 2100 0.5211 -3.8093 -6.1883 0.7200 2.3790 -233.7993 -232.3311 -2.8038 -2.8139
0.5673 0.43 2200 0.5023 -3.5883 -5.9175 0.7150 2.3292 -231.0912 -230.1214 -2.8037 -2.8138
0.5005 0.45 2300 0.4872 -4.1436 -6.3181 0.7100 2.1745 -235.0966 -235.6737 -2.8294 -2.8332
0.6603 0.47 2400 0.5267 -3.3589 -5.5272 0.7075 2.1683 -227.1882 -227.8270 -2.8627 -2.8651
0.5727 0.49 2500 0.4951 -3.3625 -5.6616 0.6975 2.2991 -228.5322 -227.8636 -2.8476 -2.8481
0.5962 0.5 2600 0.4849 -3.1557 -5.5646 0.7050 2.4088 -227.5615 -225.7954 -2.7944 -2.7953
0.5934 0.52 2700 0.4860 -3.8532 -6.7557 0.7125 2.9025 -239.4730 -232.7698 -2.7885 -2.7894
0.5091 0.54 2800 0.4818 -4.7384 -7.6121 0.7225 2.8738 -248.0370 -241.6216 -2.7868 -2.7894
0.4864 0.56 2900 0.4803 -4.1245 -6.9430 0.7175 2.8185 -241.3460 -235.4826 -2.7678 -2.7678
0.4882 0.58 3000 0.4968 -3.5637 -6.0747 0.6975 2.5109 -232.6625 -229.8754 -2.7911 -2.7899
0.4958 0.6 3100 0.4830 -4.0211 -6.7889 0.7000 2.7679 -239.8054 -234.4488 -2.8052 -2.8041
0.6056 0.62 3200 0.4876 -3.3706 -6.0612 0.7125 2.6906 -232.5282 -227.9439 -2.8433 -2.8414
0.6339 0.64 3300 0.5043 -3.5676 -6.4130 0.7150 2.8453 -236.0455 -229.9143 -2.7996 -2.8006
0.5974 0.66 3400 0.5701 -4.3288 -6.8724 0.6975 2.5436 -240.6396 -237.5260 -2.6382 -2.6407
0.4836 0.68 3500 0.5171 -5.5367 -8.5107 0.7100 2.9739 -257.0226 -249.6052 -2.5631 -2.5693
0.6342 0.7 3600 0.5060 -4.7743 -7.7389 0.7125 2.9646 -249.3053 -241.9812 -2.5904 -2.5960
0.5143 0.72 3700 0.4835 -3.2159 -5.8473 0.7000 2.6314 -230.3890 -226.3973 -2.6497 -2.6518
0.5471 0.74 3800 0.5060 -4.2691 -7.0738 0.6925 2.8047 -242.6543 -236.9293 -2.7508 -2.7518
0.4817 0.76 3900 0.5294 -4.4262 -7.2968 0.6975 2.8706 -244.8839 -238.4999 -2.6395 -2.6443
0.4616 0.78 4000 0.5019 -4.5134 -7.6868 0.7050 3.1733 -248.7837 -239.3724 -2.6056 -2.6114
0.5042 0.8 4100 0.5084 -4.2298 -7.2113 0.6975 2.9816 -244.0292 -236.5357 -2.5689 -2.5812
0.5486 0.82 4200 0.5036 -5.0660 -8.1825 0.7025 3.1165 -253.7406 -244.8979 -2.6022 -2.6123
0.4509 0.83 4300 0.4977 -5.3656 -8.6440 0.7200 3.2784 -258.3560 -247.8943 -2.6750 -2.6865
0.4964 0.85 4400 0.5052 -4.1702 -7.4107 0.7025 3.2405 -246.0230 -235.9397 -2.6844 -2.6917
0.5711 0.87 4500 0.4862 -4.8093 -8.4396 0.7100 3.6303 -256.3118 -242.3308 -2.5774 -2.5880
0.5481 0.89 4600 0.4935 -3.3995 -6.4894 0.7100 3.0899 -236.8096 -228.2326 -2.6268 -2.6335
0.4468 0.91 4700 0.4905 -3.7618 -6.8195 0.7000 3.0577 -240.1110 -231.8562 -2.7280 -2.7352
0.5001 0.93 4800 0.4867 -4.5571 -8.3247 0.7025 3.7676 -255.1630 -239.8094 -2.7686 -2.7782
0.4342 0.95 4900 0.4948 -4.5786 -7.9872 0.7000 3.4086 -251.7877 -240.0242 -2.7917 -2.7980
0.5148 0.97 5000 0.4877 -5.1096 -8.4529 0.6925 3.3433 -256.4448 -245.3341 -2.8001 -2.8058
0.456 0.99 5100 0.4937 -4.2851 -7.4575 0.6950 3.1723 -246.4907 -237.0894 -2.6952 -2.6993
0.1524 1.01 5200 0.4892 -4.5395 -8.3117 0.7050 3.7722 -255.0330 -239.6328 -2.6544 -2.6617
0.1647 1.03 5300 0.5095 -5.2562 -9.5283 0.7000 4.2721 -267.1991 -246.8001 -2.6140 -2.6246
0.1757 1.05 5400 0.5466 -4.5672 -8.6264 0.7100 4.0592 -258.1795 -239.9100 -2.5278 -2.5419
0.1386 1.07 5500 0.5161 -5.6603 -10.2201 0.7000 4.5598 -274.1167 -250.8408 -2.6169 -2.6286
0.0945 1.09 5600 0.5457 -6.4516 -10.8169 0.7075 4.3653 -280.0851 -258.7542 -2.6735 -2.6844
0.1396 1.11 5700 0.5313 -5.8463 -9.3298 0.6875 3.4835 -265.2138 -252.7006 -2.6761 -2.6860
0.0672 1.13 5800 0.5429 -4.8659 -8.1130 0.6825 3.2471 -253.0459 -242.8967 -2.7019 -2.7118
0.1091 1.15 5900 0.5826 -6.4030 -10.4523 0.6950 4.0493 -276.4388 -258.2681 -2.6196 -2.6339
0.1643 1.16 6000 0.5503 -6.7800 -11.1528 0.7050 4.3728 -283.4437 -262.0378 -2.5799 -2.5910
0.1091 1.18 6100 0.5209 -6.3057 -10.2456 0.7075 3.9399 -274.3719 -257.2953 -2.6904 -2.7025
0.1128 1.2 6200 0.5366 -6.6096 -11.0874 0.7050 4.4778 -282.7897 -260.3337 -2.6117 -2.6289
0.2009 1.22 6300 0.5346 -7.9528 -12.6518 0.7100 4.6990 -298.4337 -273.7660 -2.7132 -2.7317
0.1862 1.24 6400 0.5410 -8.5641 -13.2525 0.7050 4.6884 -304.4410 -279.8788 -2.6740 -2.6900
0.137 1.26 6500 0.6052 -5.1981 -9.2068 0.6850 4.0087 -263.9841 -246.2192 -2.7289 -2.7445
0.2336 1.28 6600 0.5168 -6.2470 -10.5787 0.6950 4.3317 -277.7033 -256.7079 -2.6187 -2.6338
0.1341 1.3 6700 0.5187 -6.1031 -10.6578 0.6975 4.5547 -278.4937 -255.2690 -2.7004 -2.7111
0.0945 1.32 6800 0.5340 -6.7845 -11.3285 0.7175 4.5440 -285.2012 -262.0835 -2.5875 -2.5996
0.1569 1.34 6900 0.5556 -7.1182 -11.5857 0.7025 4.4675 -287.7730 -265.4196 -2.4990 -2.5094
0.1122 1.36 7000 0.5235 -6.6992 -11.4976 0.7075 4.7983 -286.8915 -261.2301 -2.5685 -2.5817
0.126 1.38 7100 0.5673 -7.6522 -12.6005 0.7025 4.9483 -297.9209 -270.7601 -2.5857 -2.5972
0.0913 1.4 7200 0.5452 -8.0889 -13.4935 0.7075 5.4046 -306.8511 -275.1268 -2.5162 -2.5292
0.1582 1.42 7300 0.5486 -8.1334 -12.8551 0.6800 4.7218 -300.4672 -275.5717 -2.6257 -2.6350
0.1205 1.44 7400 0.5641 -7.6471 -12.6048 0.6925 4.9577 -297.9639 -270.7087 -2.4955 -2.5095
0.1483 1.46 7500 0.5353 -6.8197 -11.9537 0.7100 5.1340 -291.4525 -262.4351 -2.4457 -2.4622
0.1431 1.48 7600 0.5331 -7.2397 -12.3675 0.6975 5.1277 -295.5908 -266.6355 -2.4740 -2.4903
0.1604 1.49 7700 0.5209 -7.0411 -12.0568 0.7050 5.0158 -292.4845 -264.6489 -2.5381 -2.5512
0.1578 1.51 7800 0.5121 -6.9548 -11.8277 0.6950 4.8729 -290.1931 -263.7859 -2.5551 -2.5713
0.1548 1.53 7900 0.5030 -7.1085 -11.7981 0.6900 4.6896 -289.8969 -265.3228 -2.5464 -2.5678
0.114 1.55 8000 0.5224 -7.2558 -12.1665 0.7075 4.9107 -293.5809 -266.7961 -2.5693 -2.5890
0.112 1.57 8100 0.5374 -6.0601 -10.5624 0.7000 4.5023 -277.5395 -254.8386 -2.5735 -2.5933
0.1436 1.59 8200 0.5276 -7.0490 -11.9957 0.7175 4.9467 -291.8731 -264.7281 -2.5737 -2.5931
0.1369 1.61 8300 0.5191 -6.7010 -11.3389 0.6875 4.6378 -285.3046 -261.2485 -2.5764 -2.5965
0.1545 1.63 8400 0.5306 -7.5656 -12.7404 0.6975 5.1748 -299.3195 -269.8939 -2.4636 -2.4827
0.1052 1.65 8500 0.5248 -9.0789 -14.5883 0.6975 5.5093 -317.7987 -285.0275 -2.3273 -2.3513
0.1193 1.67 8600 0.5251 -8.3078 -13.6412 0.6925 5.3334 -308.3281 -277.3158 -2.3198 -2.3432
0.143 1.69 8700 0.5170 -7.0677 -11.8368 0.7000 4.7691 -290.2836 -264.9151 -2.4523 -2.4667
0.0811 1.71 8800 0.5284 -9.8027 -14.9178 0.6925 5.1151 -321.0940 -292.2650 -2.4860 -2.5043
0.1453 1.73 8900 0.5207 -9.0979 -13.9403 0.6900 4.8424 -311.3193 -285.2171 -2.4686 -2.4829
0.1157 1.75 9000 0.5219 -8.2920 -13.4085 0.6950 5.1166 -306.0013 -277.1577 -2.4449 -2.4595
0.127 1.77 9100 0.5276 -6.9887 -11.5591 0.6825 4.5704 -287.5068 -264.1252 -2.4681 -2.4831
0.0787 1.79 9200 0.5369 -6.7075 -11.2769 0.7000 4.5694 -284.6848 -261.3131 -2.4596 -2.4762
0.1575 1.81 9300 0.5331 -8.4908 -13.7127 0.7050 5.2220 -309.0434 -279.1460 -2.4321 -2.4546
0.1627 1.82 9400 0.5200 -6.8366 -10.9055 0.7125 4.0689 -280.9706 -262.6037 -2.5689 -2.5831
0.1334 1.84 9500 0.5144 -7.5260 -11.8235 0.7150 4.2975 -290.1509 -269.4985 -2.6028 -2.6165
0.1662 1.86 9600 0.5175 -7.1968 -11.7428 0.6975 4.5461 -289.3443 -266.2057 -2.5049 -2.5208
0.1138 1.88 9700 0.5252 -7.5737 -12.3038 0.7025 4.7301 -294.9536 -269.9750 -2.4780 -2.4926
0.2393 1.9 9800 0.5221 -7.4920 -12.0828 0.7000 4.5908 -292.7436 -269.1580 -2.5587 -2.5731
0.1172 1.92 9900 0.5310 -7.7405 -12.5669 0.7050 4.8264 -297.5852 -271.6433 -2.6025 -2.6177
0.0687 1.94 10000 0.5245 -7.4571 -12.0960 0.7025 4.6388 -292.8755 -268.8094 -2.6112 -2.6241
0.1132 1.96 10100 0.5272 -6.7368 -11.6496 0.7125 4.9128 -288.4121 -261.6057 -2.5953 -2.6080
0.1348 1.98 10200 0.5210 -7.7647 -12.7599 0.7050 4.9952 -299.5146 -271.8849 -2.6272 -2.6401
0.1342 2.0 10300 0.5258 -7.4707 -12.4888 0.7050 5.0181 -296.8041 -268.9455 -2.6177 -2.6298
0.0845 2.02 10400 0.5396 -8.2669 -13.7888 0.7050 5.5218 -309.8035 -276.9074 -2.5951 -2.6106
0.0723 2.04 10500 0.5642 -8.5547 -14.4525 0.7100 5.8979 -316.4410 -279.7846 -2.5829 -2.5997
0.0411 2.06 10600 0.5769 -10.3244 -16.4855 0.7100 6.1611 -336.7709 -297.4823 -2.5386 -2.5588
0.0459 2.08 10700 0.5941 -10.0803 -16.5051 0.7050 6.4248 -336.9667 -295.0412 -2.5232 -2.5440
0.0586 2.1 10800 0.5881 -10.2406 -16.7137 0.7075 6.4731 -339.0529 -296.6443 -2.5167 -2.5395
0.0599 2.12 10900 0.6149 -11.8905 -18.7301 0.7025 6.8396 -359.2173 -313.1431 -2.4992 -2.5247
0.0518 2.14 11000 0.6386 -11.8801 -18.8420 0.7050 6.9619 -360.3356 -313.0391 -2.5353 -2.5590
0.0668 2.15 11100 0.6274 -11.6788 -18.8639 0.7000 7.1851 -360.5554 -311.0262 -2.5090 -2.5340
0.1038 2.17 11200 0.6328 -11.7225 -19.0866 0.6975 7.3642 -362.7824 -311.4629 -2.5016 -2.5274
0.0684 2.19 11300 0.6159 -11.1067 -18.0268 0.7000 6.9202 -352.1844 -305.3046 -2.5287 -2.5490
0.1067 2.21 11400 0.6008 -10.1890 -16.6563 0.6975 6.4674 -338.4790 -296.1276 -2.5787 -2.5974
0.076 2.23 11500 0.6069 -9.1764 -15.6022 0.7025 6.4258 -327.9375 -286.0017 -2.5649 -2.5814
0.0831 2.25 11600 0.6081 -9.5029 -16.1909 0.7050 6.6881 -333.8254 -289.2670 -2.5353 -2.5539
0.0767 2.27 11700 0.6232 -9.8702 -17.4220 0.7050 7.5518 -346.1356 -292.9401 -2.4918 -2.5128
0.0637 2.29 11800 0.6183 -10.4232 -18.0363 0.7000 7.6131 -352.2786 -298.4702 -2.4901 -2.5110
0.0578 2.31 11900 0.6302 -10.3920 -18.1840 0.7100 7.7920 -353.7556 -298.1579 -2.5045 -2.5246
0.0665 2.33 12000 0.6309 -10.2916 -18.1950 0.6950 7.9034 -353.8656 -297.1541 -2.5204 -2.5402
0.0854 2.35 12100 0.6348 -10.5627 -18.5024 0.7000 7.9397 -356.9398 -299.8650 -2.5142 -2.5344
0.0663 2.37 12200 0.6440 -10.3562 -18.2213 0.7000 7.8651 -354.1292 -297.8000 -2.5163 -2.5366
0.0926 2.39 12300 0.6197 -9.9404 -17.5147 0.7050 7.5743 -347.0634 -293.6423 -2.5421 -2.5607
0.0846 2.41 12400 0.6193 -8.7158 -15.2039 0.7075 6.4881 -323.9550 -281.3965 -2.5292 -2.5454
0.0552 2.43 12500 0.6213 -9.1585 -15.8640 0.7025 6.7055 -330.5561 -285.8229 -2.5610 -2.5763
0.0667 2.45 12600 0.6205 -10.2959 -17.4638 0.7075 7.1679 -346.5536 -297.1967 -2.5533 -2.5720
0.0529 2.47 12700 0.6300 -10.4017 -17.5790 0.7100 7.1773 -347.7064 -298.2553 -2.5342 -2.5525
0.0572 2.48 12800 0.6499 -10.9914 -18.7161 0.7050 7.7246 -359.0765 -304.1523 -2.4994 -2.5215
0.0687 2.5 12900 0.6573 -11.8845 -19.7886 0.7050 7.9041 -369.8018 -313.0834 -2.5499 -2.5703
0.0658 2.52 13000 0.6460 -12.3055 -20.3852 0.7075 8.0797 -375.7680 -317.2932 -2.5374 -2.5585
0.0897 2.54 13100 0.6673 -12.6608 -20.9130 0.7000 8.2522 -381.0459 -320.8460 -2.4577 -2.4810
0.0386 2.56 13200 0.6575 -12.9730 -21.4438 0.7000 8.4707 -386.3536 -323.9682 -2.4453 -2.4703
0.0771 2.58 13300 0.6375 -11.0609 -18.4087 0.7050 7.3478 -356.0026 -304.8467 -2.5407 -2.5590
0.0704 2.6 13400 0.6408 -11.4177 -18.9599 0.7050 7.5422 -361.5145 -308.4147 -2.5313 -2.5503
0.0715 2.62 13500 0.6433 -11.8351 -19.8071 0.7025 7.9721 -369.9872 -312.5887 -2.5056 -2.5267
0.0511 2.64 13600 0.6403 -11.2684 -19.2078 0.6975 7.9394 -363.9937 -306.9222 -2.4818 -2.5038
0.0848 2.66 13700 0.6501 -12.1104 -20.3324 0.7025 8.2220 -375.2401 -315.3422 -2.4718 -2.4955
0.0724 2.68 13800 0.6394 -12.3498 -20.4017 0.7025 8.0519 -375.9328 -317.7358 -2.4852 -2.5077
0.0735 2.7 13900 0.6576 -13.0635 -21.4725 0.7050 8.4091 -386.6412 -324.8728 -2.4330 -2.4579
0.0836 2.72 14000 0.6427 -12.7069 -20.9002 0.7050 8.1933 -380.9181 -321.3069 -2.4284 -2.4533
0.0647 2.74 14100 0.6445 -12.4746 -20.6872 0.7100 8.2126 -378.7882 -318.9844 -2.4287 -2.4541
0.0732 2.76 14200 0.6514 -11.6405 -19.7905 0.7100 8.1499 -369.8207 -310.6434 -2.4271 -2.4516
0.05 2.78 14300 0.6599 -12.1077 -20.4715 0.7125 8.3638 -376.6314 -315.3156 -2.3948 -2.4208
0.0881 2.8 14400 0.6585 -11.5941 -19.5927 0.7150 7.9986 -367.8432 -310.1794 -2.4139 -2.4396
0.0992 2.81 14500 0.6617 -11.9570 -20.2009 0.7075 8.2439 -373.9248 -313.8076 -2.4018 -2.4285
0.0582 2.83 14600 0.6693 -12.4244 -20.9747 0.7025 8.5502 -381.6627 -318.4825 -2.3966 -2.4239
0.0536 2.85 14700 0.6742 -12.5108 -21.1850 0.7025 8.6742 -383.7661 -319.3458 -2.3764 -2.4054
0.0615 2.87 14800 0.6776 -12.8026 -21.6040 0.7025 8.8015 -387.9562 -322.2637 -2.3824 -2.4112
0.0532 2.89 14900 0.6769 -12.9977 -21.8501 0.7025 8.8523 -390.4167 -324.2155 -2.3852 -2.4138
0.0742 2.91 15000 0.6786 -13.2980 -22.2390 0.6950 8.9410 -394.3063 -327.2182 -2.3807 -2.4097
0.0626 2.93 15100 0.6752 -13.2158 -22.0873 0.7025 8.8714 -392.7889 -326.3966 -2.3974 -2.4253
0.046 2.95 15200 0.6734 -13.2380 -22.1199 0.7050 8.8819 -393.1146 -326.6176 -2.3977 -2.4255
0.0464 2.97 15300 0.6734 -13.2348 -22.1145 0.7025 8.8798 -393.0614 -326.5859 -2.4020 -2.4298
0.0599 2.99 15400 0.6729 -13.2586 -22.1444 0.7075 8.8859 -393.3602 -326.8238 -2.4035 -2.4313

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.1
Downloads last month
58
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for vwxyzjn/mistral-7b-dpo-constitutional-ai

Finetuned
(3)
this model
Quantizations
1 model

Datasets used to train vwxyzjn/mistral-7b-dpo-constitutional-ai