RLAIF-V-Dataset
This model is a fine-tuned version of llava-hf/llava-v1.6-mistral-7b-hf on the RLAIF-V-Dataset dataset. It achieves the following results on the evaluation set:
- Loss: 0.4467
- Rewards/chosen: -3.1988
- Rewards/rejected: -5.9606
- Rewards/accuracies: 0.8163
- Rewards/margins: 2.7618
- Logps/rejected: -218.4866
- Logps/chosen: -190.4653
- Logits/rejected: -2.3732
- Logits/chosen: -2.4055
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.5777 | 0.1709 | 50 | 0.5813 | -0.4541 | -1.0668 | 0.6683 | 0.6127 | -169.5483 | -163.0182 | -2.5153 | -2.5221 |
0.4982 | 0.3419 | 100 | 0.5161 | -0.9806 | -2.1974 | 0.7212 | 1.2168 | -180.8539 | -168.2832 | -2.4606 | -2.4847 |
0.4954 | 0.5128 | 150 | 0.4770 | -1.5352 | -3.2803 | 0.7548 | 1.7451 | -191.6833 | -173.8291 | -2.0991 | -2.1473 |
0.4567 | 0.6838 | 200 | 0.4598 | -1.1951 | -2.8406 | 0.7596 | 1.6455 | -187.2865 | -170.4288 | -2.1090 | -2.1587 |
0.4873 | 0.8547 | 250 | 0.4487 | -1.9205 | -3.6640 | 0.7635 | 1.7435 | -195.5203 | -177.6819 | -2.5457 | -2.5724 |
0.2176 | 1.0256 | 300 | 0.4383 | -1.1991 | -3.1202 | 0.7846 | 1.9211 | -190.0823 | -170.4688 | -2.3130 | -2.3490 |
0.2095 | 1.1966 | 350 | 0.4537 | -2.3545 | -4.8732 | 0.7933 | 2.5188 | -207.6123 | -182.0219 | -2.3656 | -2.3942 |
0.1952 | 1.3675 | 400 | 0.4353 | -1.9722 | -4.1870 | 0.7962 | 2.2148 | -200.7505 | -178.1995 | -2.3058 | -2.3361 |
0.1819 | 1.5385 | 450 | 0.4321 | -2.0466 | -4.4416 | 0.8077 | 2.3950 | -203.2960 | -178.9431 | -2.2282 | -2.2612 |
0.1932 | 1.7094 | 500 | 0.4247 | -1.8597 | -4.1324 | 0.8087 | 2.2727 | -200.2041 | -177.0739 | -2.2659 | -2.2970 |
0.1921 | 1.8803 | 550 | 0.4131 | -2.3219 | -4.8505 | 0.8183 | 2.5286 | -207.3855 | -181.6965 | -2.3691 | -2.3985 |
0.0868 | 2.0513 | 600 | 0.4392 | -2.7792 | -5.2414 | 0.8135 | 2.4623 | -211.2946 | -186.2690 | -2.4330 | -2.4615 |
0.0825 | 2.2222 | 650 | 0.4447 | -3.2209 | -6.0852 | 0.8154 | 2.8642 | -219.7319 | -190.6867 | -2.3962 | -2.4295 |
0.0925 | 2.3932 | 700 | 0.4449 | -3.2092 | -6.0685 | 0.8183 | 2.8593 | -219.5651 | -190.5695 | -2.3854 | -2.4189 |
0.0754 | 2.5641 | 750 | 0.4567 | -3.3570 | -6.0710 | 0.8115 | 2.7141 | -219.5908 | -192.0472 | -2.3789 | -2.4105 |
0.0707 | 2.7350 | 800 | 0.4484 | -3.2447 | -6.0070 | 0.8135 | 2.7622 | -218.9498 | -190.9248 | -2.3739 | -2.4066 |
0.0739 | 2.9060 | 850 | 0.4468 | -3.2032 | -5.9670 | 0.8173 | 2.7638 | -218.5504 | -190.5096 | -2.3732 | -2.4054 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.20.3
- Downloads last month
- 7
Inference API (serverless) does not yet support transformers models for this pipeline type.
Model tree for htlou/mm-interp-RLAIF-V-Dataset
Base model
llava-hf/llava-v1.6-mistral-7b-hf