KoNqUeRoR3891
/

HW2-orpo

+---
+library_name: transformers
+license: mit
+base_model: openai-community/gpt2
+tags:
+- trl
+- orpo
+- generated_from_trainer
+datasets:
+- piqa
+model-index:
+- name: HW2-orpo
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# HW2-orpo
+This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on the piqa dataset.
+It achieves the following results on the evaluation set:
+- Loss: 3.8617
+- Rewards/chosen: -0.3716
+- Rewards/rejected: -0.3885
+- Rewards/accuracies: 0.6390
+- Rewards/margins: 0.0170
+- Logps/rejected: -3.8851
+- Logps/chosen: -3.7156
+- Logits/rejected: -3.3968
+- Logits/chosen: -3.5059
+- Nll Loss: 3.7885
+- Log Odds Ratio: -0.7324
+- Log Odds Chosen: 0.1830
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 8
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 5
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 3.5511        | 0.2758 | 500  | 3.4162          | -0.3146        | -0.3224          | 0.6303             | 0.0078          | -3.2238        | -3.1457      | -12.1919        | -12.3316      | 3.3464   | -0.6978        | 0.0837          |
+| 3.3852        | 0.5517 | 1000 | 3.3345          | -0.3060        | -0.3152          | 0.6421             | 0.0092          | -3.1517        | -3.0602      | -3.3351         | -3.5024       | 3.2656   | -0.6894        | 0.0984          |
+| 3.2734        | 0.8275 | 1500 | 3.2903          | -0.3011        | -0.3101          | 0.6309             | 0.0090          | -3.1013        | -3.0113      | -5.6602         | -5.7320       | 3.2211   | -0.6920        | 0.0975          |
+| 3.104         | 1.1034 | 2000 | 3.2933          | -0.3021        | -0.3118          | 0.6371             | 0.0097          | -3.1182        | -3.0211      | -0.2253         | -0.3135       | 3.2237   | -0.6956        | 0.1062          |
+| 2.8138        | 1.3792 | 2500 | 3.2816          | -0.3018        | -0.3125          | 0.6464             | 0.0107          | -3.1253        | -3.0179      | 1.3216          | 1.2346        | 3.2125   | -0.6916        | 0.1172          |
+| 2.8178        | 1.6551 | 3000 | 3.2660          | -0.2998        | -0.3108          | 0.6383             | 0.0109          | -3.1080        | -2.9985      | -0.7475         | -0.8064       | 3.1968   | -0.6923        | 0.1204          |
+| 2.8122        | 1.9309 | 3500 | 3.2586          | -0.2992        | -0.3104          | 0.6433             | 0.0112          | -3.1039        | -2.9922      | -2.8285         | -2.9509       | 3.1893   | -0.6925        | 0.1228          |
+| 2.4931        | 2.2067 | 4000 | 3.3765          | -0.3130        | -0.3256          | 0.6427             | 0.0127          | -3.2563        | -3.1296      | 1.6707          | 1.5380        | 3.3063   | -0.7020        | 0.1392          |
+| 2.3999        | 2.4826 | 4500 | 3.4109          | -0.3174        | -0.3298          | 0.6402             | 0.0125          | -3.2982        | -3.1736      | 1.4695          | 1.2634        | 3.3402   | -0.7069        | 0.1373          |
+| 2.4254        | 2.7584 | 5000 | 3.3882          | -0.3150        | -0.3278          | 0.6439             | 0.0128          | -3.2781        | -3.1497      | 2.1282          | 1.9044        | 3.3180   | -0.7018        | 0.1416          |
+| 2.373         | 3.0343 | 5500 | 3.5698          | -0.3370        | -0.3515          | 0.6408             | 0.0145          | -3.5149        | -3.3698      | 3.7150          | 3.6601        | 3.4983   | -0.7147        | 0.1595          |
+| 2.0541        | 3.3101 | 6000 | 3.6256          | -0.3430        | -0.3570          | 0.6284             | 0.0140          | -3.5700        | -3.4302      | 1.1269          | 0.9714        | 3.5532   | -0.7240        | 0.1540          |
+| 2.0641        | 3.5860 | 6500 | 3.6157          | -0.3425        | -0.3577          | 0.6445             | 0.0152          | -3.5771        | -3.4246      | -0.6703         | -0.8165       | 3.5439   | -0.7178        | 0.1665          |
+| 2.0747        | 3.8618 | 7000 | 3.6335          | -0.3447        | -0.3598          | 0.6402             | 0.0151          | -3.5983        | -3.4474      | -0.1967         | -0.3291       | 3.5616   | -0.7193        | 0.1640          |
+| 1.9377        | 4.1376 | 7500 | 3.8286          | -0.3671        | -0.3838          | 0.6445             | 0.0167          | -3.8381        | -3.6712      | -2.6871         | -2.8058       | 3.7557   | -0.7288        | 0.1800          |
+| 1.8001        | 4.4135 | 8000 | 3.8629          | -0.3715        | -0.3882          | 0.6414             | 0.0168          | -3.8822        | -3.7146      | -3.4193         | -3.5370       | 3.7898   | -0.7315        | 0.1810          |
+| 1.81          | 4.6893 | 8500 | 3.8574          | -0.3711        | -0.3879          | 0.6396             | 0.0168          | -3.8789        | -3.7110      | -4.2176         | -4.3406       | 3.7842   | -0.7321        | 0.1814          |
+| 1.8108        | 4.9652 | 9000 | 3.8617          | -0.3716        | -0.3885          | 0.6390             | 0.0170          | -3.8851        | -3.7156      | -3.3968         | -3.5059       | 3.7885   | -0.7324        | 0.1830          |
+### Framework versions
+- Transformers 4.44.2
+- Pytorch 2.4.0+cu118
+- Datasets 2.21.0
+- Tokenizers 0.19.1

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "4.44.2"
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f0b0e2f0bc85de08e1043061a6fb0d9154a11d05328b0b3812361e4a540e5ef5
 size 497774208

 version https://git-lfs.github.com/spec/v1
+oid sha256:8559b29b32ba7042153e90f72d46d8be9d367e00df9dc812b45cbf82dfb4bcd1
 size 497774208