File size: 2,215 Bytes
fbcff2d
 
 
4c1b998
 
 
fbcff2d
 
4c1b998
 
fbcff2d
4c1b998
fbcff2d
 
 
4c1b998
 
 
fbcff2d
4c1b998
fbcff2d
4c1b998
 
 
fbcff2d
4c1b998
fbcff2d
4c1b998
fbcff2d
 
4c1b998
fbcff2d
 
 
4c1b998
fbcff2d
 
 
4c1b998
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fbcff2d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
library_name: peft
base_model: mistralai/Mistral-7B-Instruct-v0.2
license: apache-2.0
language:
- en
---

# Suri-I-ORPO
Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check [our paper](TODO) for more details on the method. 

## ๐Ÿ“’ Model Details

### Model Description

- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

### Model Sources

- **Repository:** [Github repository](https://github.com/chtmp223/suri) -- contains code to reconstruct books3 subset. 
- **Paper:** TODO
- **Demo:** [Website](https://chtmp223.github.io/suri)

## โš ๏ธ Getting Started

Use the code in [this repository](https://github.com/chtmp223/suri) for training and inference. 


## ๐Ÿ’ป Training Details

### Training Data

[chtmp223/suri](https://huggingface.co/datasets/chtmp223/suri)

### Training Procedure

| **Configurations**               | **Values**   |
|----------------------------------|--------------|
| Hardware (Training and Inference)| 4xA100s      |
| Tracking                         | wandb        |
| lora_r                           | 16           |
| lora_alpha                       | 16           |
| lora_dropout                     | 0.05         |
| beta                             | 0.4          |
| gradient_accumulation_steps      | 1            |
| gradient_checkpointing           | True         |
| learning_rate                    | 5.0e-5       |
| lr_scheduler_type                | cosine       |
| max_length                       | 15024        |
| max_completion_length            | 15000        |
| max_prompt_length                | 5000         |
| num_train_epochs                 | 2            |
| optim                            | adamw_torch  |
| per_device_train_batch_size      | 1            |


#### ๐Ÿค— Software

Training code is adapted from [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and [Trl](https://github.com/huggingface/trl).

## ๐Ÿ“œ Citation 

```
TODO
```

### โš™๏ธ Framework versions

- PEFT 0.11.1