--- language: - en license: apache-2.0 tags: - generated_from_trainer base_model: microsoft/phi-2 pipeline_tag: text-generation --- # outputs This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) using [trl](https://github.com/huggingface/trl) on [ultrafeedback dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized). # What's new A test for [ORPO: Monolithic Preference Optimization without Reference Model](https://arxiv.org/pdf/2403.07691.pdf) method using trl library. ## How to reproduce ```bash accelerate launch --config_file=/path/to/trl/examples/accelerate_configs/deepspeed_zero2.yaml \ --num_processes 8 \ /path/to/trl/scripts/orpo.py \ --model_name_or_path="microsoft/phi-2" \ --per_device_train_batch_size 1 \ --max_steps 8000 \ --learning_rate 8e-5 \ --gradient_accumulation_steps 1 \ --logging_steps 20 \ --eval_steps 2000 \ --output_dir="orpo-lora-phi2" \ --optim rmsprop \ --warmup_steps 150 \ --bf16 \ --logging_first_step \ --no_remove_unused_columns \ --use_peft \ --lora_r=16 \ --lora_alpha=16 \ --dataset HuggingFaceH4/ultrafeedback_binarized ```