abshetty
/

llava-lora-12-06-rpo-0.1

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

abshetty commited on 21 days ago

Commit

024f6c6

•

1 Parent(s): 1cf7acf

Update README.md

Files changed (1) hide show

README.md +40 -1

README.md CHANGED Viewed

@@ -65,4 +65,43 @@ Cite TRL as:
 	publisher    = {GitHub},
 	howpublished = {\url{https://github.com/huggingface/trl}}
 }
-```

 	publisher    = {GitHub},
 	howpublished = {\url{https://github.com/huggingface/trl}}
 }
+```
+#Train the model
+training_args = DPOConfig(
+    output_dir="llava-lora-12-06-rpo-0.1",
+    bf16=True,
+    gradient_checkpointing=True,
+    per_device_train_batch_size=8,
+    per_device_eval_batch_size=4,
+    gradient_accumulation_steps=32,
+    evaluation_strategy="steps",
+    eval_steps=1,
+    learning_rate=1e-5,
+    beta=0.1,
+    warmup_ratio=0.1,
+    lr_scheduler_type="cosine",
+    num_train_epochs=2,
+    rpo_alpha=0.1,
+    dataset_num_proc=32,  # tokenization will use 32 processes
+    dataloader_num_workers=32,  # data loading will use 32 workers
+    logging_steps=1,
+)
+#Define LoRA configuration with specified rank
+lora_config = LoraConfig(
+    r=64,  # Set rank to 64
+    lora_alpha=128,  # Set scaling factor to 128
+    target_modules="all-linear",  # Target all linear layers
+    lora_dropout=0.1,
+)
+trainer = DPOTrainer(
+    model,
+    ref_model=None,  # not needed when using peft
+    args=training_args,
+    train_dataset=train_dataset,
+    eval_dataset=eval_dataset,
+    tokenizer=processor,
+    peft_config=lora_config,
+)