ContextualAI
/

Contextual_KTO_Mistral_PairRM

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Contextual_KTO_Mistral_PairRM / README.md

xwinxu's picture

Update README.md

98bee13 verified 8 months ago

|

history blame contribute delete

2.23 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- human feedback
	- rlhf
	- preferences
	- alignment
	- HALO
	- halos
	- dpo
	- rl
	datasets:
	- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
	metrics:
	- accuracy
	---

	This repo contains the model and tokenizer checkpoints for:
	- model family [<b>mistralai/Mistral-7B-Instruct-v0.2</b>](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
	- optimized with the loss [<b>KTO</b>](https://twitter.com/winniethexu/status/1732839295365554643)
	- aligned using the [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
	- via 3 iterations of KTO on one epoch of each training partition, each previous iteration's model serving as the reference for the subsequent.

	[03/06/2024]: We are #2 on the (verified) [Alpaca Eval 2.0 Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) scoring 33.23!

	To prompt this model, ensure that the format is consistent with that of TuluV2.
	For example, a prompt should be formatted as follows, where `<\|user\|>` corresponds to the human's role and `<\|assistant\|>` corresponds to the LLM's role.
	The human should speak first:
	```

	<\|user\|>
	Hi! I'm looking for a cake recipe.
	<\|assistant\|>
	What kind of cake?
	<\|user\|>
	Chocolate cake.
	<\|assistant\|>
	```
	Note that a beginning-of-sequence (BOS) token is automatically added at tokenization time and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt.
	You may also use our tokenizer's `apply_chat_template` if doing inference with `chatml` set or evaluating generations through non-local clients.


	For more info on KTO refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) for more details on the methodology.

	If you found this work useful, feel free to cite [our work](https://arxiv.org/abs/2402.01306):
	```
	@techreport{ethayarajh2023halos,
	author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},
	title = {Human-Centered Loss Functions (HALOs)},
	institution = {Contextual AI},
	note = {https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf},
	year = {2023},
	}
	```