wandb
/

zephyr-orpo-7b-v0.2

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

zephyr-orpo-7b-v0.2 / README.md

tcapelle's picture

Update README.md

ba89a59 verified 11 months ago

|

history blame contribute delete

1.57 kB

	---
	license: mit
	library_name: transformers
	tags:
	- trl
	- orpo
	- generated_from_trainer
	datasets:
	- argilla/distilabel-capybara-dpo-7k-binarized
	base_model: wandb/Mistral-7B-v0.2
	---

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/llm_surgery/mistral_zephyr_orpo_v0.2?nw=nwusercapecape)

	# Mistral 7B Zephyr Orpo

	The [Zephyr Orpo](https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/) recipe applied on top of Mistral 7B v0.2 (new recipe with new Mistral base model)

	## Model description

	- Model type: A 7.2B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
	- Language(s) (NLP): Primarily English
	- Finetuned from model: [wandb/Mistral-7B-v0.2](https://huggingface.co/wandb/Mistral-7B-v0.2)

	## Recipe

	We trained using the [alignment handbook recipe](https://github.com/huggingface/alignment-handbook/blob/main/scripts/run_orpo.py) and logging to W&B

	Visit the [W&B workspace here](https://wandb.ai/llm_surgery/mistral_zephyr_orpo_v0.2?nw=nwusercapecape)

	## Results:

	- MT bench
	```
	########## First turn ##########
	score
	model turn
	zephyr-orpo-7b-v0.2 1 7.44375

	########## Second turn ##########
	score
	model turn
	zephyr-orpo-7b-v0.2 2 6.875

	########## Average ##########
	score
	model
	zephyr-orpo-7b-v0.2 7.159375
	```

	## Trained on a single H100 for 2 hours!