chtmp223
/

suri-i-orpo

Model card Files Files and versions Community

suri-i-orpo / README.md

chtmp223's picture

Update README.md

4c1b998 verified 6 months ago

|

2.22 kB

	---
	library_name: peft
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	license: apache-2.0
	language:
	- en
	---

	# Suri-I-ORPO
	Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check [our paper](TODO) for more details on the method.

	## 📒 Model Details

	### Model Description

	- Language(s) (NLP): English
	- License: Apache-2.0
	- Finetuned from model: [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

	### Model Sources

	- Repository: [Github repository](https://github.com/chtmp223/suri) -- contains code to reconstruct books3 subset.
	- Paper: TODO
	- Demo: [Website](https://chtmp223.github.io/suri)

	## ⚠️ Getting Started

	Use the code in [this repository](https://github.com/chtmp223/suri) for training and inference.


	## 💻 Training Details

	### Training Data

	[chtmp223/suri](https://huggingface.co/datasets/chtmp223/suri)

	### Training Procedure

	\| Configurations \| Values \|
	\|----------------------------------\|--------------\|
	\| Hardware (Training and Inference)\| 4xA100s \|
	\| Tracking \| wandb \|
	\| lora_r \| 16 \|
	\| lora_alpha \| 16 \|
	\| lora_dropout \| 0.05 \|
	\| beta \| 0.4 \|
	\| gradient_accumulation_steps \| 1 \|
	\| gradient_checkpointing \| True \|
	\| learning_rate \| 5.0e-5 \|
	\| lr_scheduler_type \| cosine \|
	\| max_length \| 15024 \|
	\| max_completion_length \| 15000 \|
	\| max_prompt_length \| 5000 \|
	\| num_train_epochs \| 2 \|
	\| optim \| adamw_torch \|
	\| per_device_train_batch_size \| 1 \|


	#### 🤗 Software

	Training code is adapted from [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and [Trl](https://github.com/huggingface/trl).

	## 📜 Citation

	```
	TODO
	```

	### ⚙️ Framework versions

	- PEFT 0.11.1