|
--- |
|
library_name: peft |
|
base_model: mistralai/Mistral-7B-Instruct-v0.2 |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
# Suri-I-ORPO |
|
Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check [our paper](TODO) for more details on the method. |
|
|
|
## π Model Details |
|
|
|
### Model Description |
|
|
|
- **Language(s) (NLP):** English |
|
- **License:** Apache-2.0 |
|
- **Finetuned from model:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [Github repository](https://github.com/chtmp223/suri) -- contains code to reconstruct books3 subset. |
|
- **Paper:** TODO |
|
- **Demo:** [Website](https://chtmp223.github.io/suri) |
|
|
|
## β οΈ Getting Started |
|
|
|
Use the code in [this repository](https://github.com/chtmp223/suri) for training and inference. |
|
|
|
|
|
## π» Training Details |
|
|
|
### Training Data |
|
|
|
[chtmp223/suri](https://huggingface.co/datasets/chtmp223/suri) |
|
|
|
### Training Procedure |
|
|
|
| **Configurations** | **Values** | |
|
|----------------------------------|--------------| |
|
| Hardware (Training and Inference)| 4xA100s | |
|
| Tracking | wandb | |
|
| lora_r | 16 | |
|
| lora_alpha | 16 | |
|
| lora_dropout | 0.05 | |
|
| beta | 0.4 | |
|
| gradient_accumulation_steps | 1 | |
|
| gradient_checkpointing | True | |
|
| learning_rate | 5.0e-5 | |
|
| lr_scheduler_type | cosine | |
|
| max_length | 15024 | |
|
| max_completion_length | 15000 | |
|
| max_prompt_length | 5000 | |
|
| num_train_epochs | 2 | |
|
| optim | adamw_torch | |
|
| per_device_train_batch_size | 1 | |
|
|
|
|
|
#### π€ Software |
|
|
|
Training code is adapted from [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and [Trl](https://github.com/huggingface/trl). |
|
|
|
## π Citation |
|
|
|
``` |
|
TODO |
|
``` |
|
|
|
### βοΈ Framework versions |
|
|
|
- PEFT 0.11.1 |