suri-i-orpo / README.md
chtmp223's picture
Update README.md
4c1b998 verified
|
raw
history blame
2.22 kB
---
library_name: peft
base_model: mistralai/Mistral-7B-Instruct-v0.2
license: apache-2.0
language:
- en
---
# Suri-I-ORPO
Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check [our paper](TODO) for more details on the method.
## πŸ“’ Model Details
### Model Description
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
### Model Sources
- **Repository:** [Github repository](https://github.com/chtmp223/suri) -- contains code to reconstruct books3 subset.
- **Paper:** TODO
- **Demo:** [Website](https://chtmp223.github.io/suri)
## ⚠️ Getting Started
Use the code in [this repository](https://github.com/chtmp223/suri) for training and inference.
## πŸ’» Training Details
### Training Data
[chtmp223/suri](https://huggingface.co/datasets/chtmp223/suri)
### Training Procedure
| **Configurations** | **Values** |
|----------------------------------|--------------|
| Hardware (Training and Inference)| 4xA100s |
| Tracking | wandb |
| lora_r | 16 |
| lora_alpha | 16 |
| lora_dropout | 0.05 |
| beta | 0.4 |
| gradient_accumulation_steps | 1 |
| gradient_checkpointing | True |
| learning_rate | 5.0e-5 |
| lr_scheduler_type | cosine |
| max_length | 15024 |
| max_completion_length | 15000 |
| max_prompt_length | 5000 |
| num_train_epochs | 2 |
| optim | adamw_torch |
| per_device_train_batch_size | 1 |
#### πŸ€— Software
Training code is adapted from [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and [Trl](https://github.com/huggingface/trl).
## πŸ“œ Citation
```
TODO
```
### βš™οΈ Framework versions
- PEFT 0.11.1