Cynaptics
/

ftphi3

Transformers

English

Model card Files Files and versions Community

TheNetherWatcher commited on Jun 15, 2024

Commit

702849c

verified ·

1 Parent(s): 73c362d

Create README.md

Browse files

Files changed (1) hide show

README.md +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+library_name: transformers
+language:
+- en
+---
+# Meta Llama 3 8B ORPO Model
+[Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) fine-tuned on Text-to-SQL downstream task using [Odds Ratio Preference Optimization (ORPO)](https://arxiv.org/pdf/2403.07691).
+## Details
+A 4-bit quantized version of the [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) model was used to fine-tuned on [zerolink/zsql-sqlite-dpo](https://huggingface.co/datasets/zerolink/zsql-sqlite-dpo).
+Used PEFT to merge the trained adapters.
+### Odds Ratio Preference Optimization (ORPO)
+The goal of ORPO is to penalize the "rejected" samples, and increase the likelihood of "accepted" samples.This builds upon DPO but incorporates a ranking of preferences. This means we not only learn which outputs are preferred but also their relative ranking
+#### Dataset
+The model was fine-tuned on [zerolink/zsql-sqlite-dpo](https://huggingface.co/datasets/zerolink/zsql-sqlite-dpo) dataset.
+Total entries in the dataset: 250,000
+The dataset needs to be in the following format:
+You need at least 3 columns:
+- Schema
+- Question
+- Rejected
+- Chosen
+- Weight
+For example:
+- Schema: "CREATE TABLE table_name_56 (location TEXT, year INTEGER)"
+- Question: "What location is previous to 1994?"
+- Rejected: "SELECT location FROM table_name_56 WHERE year < 1994"
+- Chosen: "SELECT "location" FROM "table_name_56" WHERE "year" < 1994"
+- Weight: 0.056641
+### Training Parameters
+* QLoRA Parameters
+  - r = 16
+  - target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj",]
+  - lora_alpha = 16
+  - lora_dropout = 0
+  - bias = None
+  - use_gradient_checkpointing = "unsloth"
+  - random_state = 3407
+Total Trainable Parameters: 41,943,040
+* [ORPO Trainer](https://huggingface.co/docs/trl/main/en/orpo_trainer) Config
+  - num_epochs = 1
+  - max_steps = 30
+  - per_device_train_batch_size = 2
+  - gradient_accumulation_step = 4
+  - optim = "adamw_8it"
+  - lr_scheduler_type = "linear,"