Transformers
English
TheNetherWatcher commited on
Commit
702849c
·
verified ·
1 Parent(s): 73c362d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # Meta Llama 3 8B ORPO Model
8
+
9
+ [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) fine-tuned on Text-to-SQL downstream task using [Odds Ratio Preference Optimization (ORPO)](https://arxiv.org/pdf/2403.07691).
10
+
11
+ ## Details
12
+
13
+ A 4-bit quantized version of the [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) model was used to fine-tuned on [zerolink/zsql-sqlite-dpo](https://huggingface.co/datasets/zerolink/zsql-sqlite-dpo).
14
+ Used PEFT to merge the trained adapters.
15
+
16
+ ### Odds Ratio Preference Optimization (ORPO)
17
+
18
+ The goal of ORPO is to penalize the "rejected" samples, and increase the likelihood of "accepted" samples.This builds upon DPO but incorporates a ranking of preferences. This means we not only learn which outputs are preferred but also their relative ranking
19
+
20
+ #### Dataset
21
+ The model was fine-tuned on [zerolink/zsql-sqlite-dpo](https://huggingface.co/datasets/zerolink/zsql-sqlite-dpo) dataset.
22
+ Total entries in the dataset: 250,000
23
+
24
+ The dataset needs to be in the following format:
25
+ You need at least 3 columns:
26
+ - Schema
27
+ - Question
28
+ - Rejected
29
+ - Chosen
30
+ - Weight
31
+
32
+ For example:
33
+ - Schema: "CREATE TABLE table_name_56 (location TEXT, year INTEGER)"
34
+ - Question: "What location is previous to 1994?"
35
+ - Rejected: "SELECT location FROM table_name_56 WHERE year < 1994"
36
+ - Chosen: "SELECT "location" FROM "table_name_56" WHERE "year" < 1994"
37
+ - Weight: 0.056641
38
+
39
+ ### Training Parameters
40
+
41
+ * QLoRA Parameters
42
+ - r = 16
43
+ - target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj",]
44
+ - lora_alpha = 16
45
+ - lora_dropout = 0
46
+ - bias = None
47
+ - use_gradient_checkpointing = "unsloth"
48
+ - random_state = 3407
49
+ Total Trainable Parameters: 41,943,040
50
+
51
+ * [ORPO Trainer](https://huggingface.co/docs/trl/main/en/orpo_trainer) Config
52
+ - num_epochs = 1
53
+ - max_steps = 30
54
+ - per_device_train_batch_size = 2
55
+ - gradient_accumulation_step = 4
56
+ - optim = "adamw_8it"
57
+ - lr_scheduler_type = "linear,"