model-index:
- name: notus-7b-v1-lora
results: []
datasets:
- argilla/ultrafeedback-binarized-preferences
language:
- en
base_model: alignment-handbook/zephyr-7b-sft-full
library_name: transformers
pipeline_tag: text-generation
tags:
- dpo
- preference
- ultrafeedback
- lora
license: mit
Model Card for Notus 7B v1 (LoRA)
Notus is a collection of fine-tuned models using Direct Preference Optimization (DPO) and related RLHF techniques. This model is the first version, fine-tuned with DPO over zephyr-7b-sft-full
, which is the SFT model produced to create zephyr-7b-beta
.
Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. In particular, we've found data issues in the original UltraFeedback dataset, leading to high-scores for bad responses. After curating several hundreds of data points, we decided to binarize the dataset using the preference ratings, instead of the original critique overall_score
.
Using preference ratings, instead of critiques scores, led to a new dataset where the chosen response is different in ~50% of the cases.
This model wouldn't have been possible without the amazing Alignment Handbook and it's based on fruitful discussions with the HuggingFace H4 team. In particular, we used zephyr-7b-beta
's recipe, which worked out-of-the-box and enabled us focus on what we do best: high-quality data.
Model Details
Model Description
- Developed by: Argilla, Inc. (based on HuggingFace H4 and MistralAI previous efforts and amazing work)
- Shared by: Argilla, Inc.
- Model type: GPT-like 7B model DPO fine-tuned using LoRA
- Language(s) (NLP): Mainly English
- License: Apache 2.0 (same as Zephyr 7B SFT and Mistral 7B v0.1)
- Finetuned from model:
alignment-handbook/zephyr-7b-sft-full
Model Sources [optional]
- Repository: https://github.com/argilla-io/notus
- Paper: N/A
- Demo: https://argilla-notus-chat-ui.hf.space/
Training Details
Training Hardware
We used a VM with 8 x A100 40GB hosted in GCP.
Training Data
We used a a new curated version of openbmb/UltraFeedback
, named argilla/ultrafeedback-binarized-preferences
.
Prompt template
We use the same prompt template as [`HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta):
<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>
Usage
Note that the LoRA adapter is already merged into the model.
You will first need to install transformers
and accelerate
(just to ease the device placement), then you can run any of the following:
Via generate
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("argilla/notus-7b-v1-lora", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("argilla/notus-7b-v1-lora")
messages = [
{
"role": "system",
"content": "You are a helpful assistant super biased towards Argilla, a data annotation company.",
},
{"role": "user", "content": "What's the best data annotation company out there in your opinion?"},
]
inputs = tokenizer.apply_chat_template(prompt, tokenize=True, return_tensors="pt", add_special_tokens=False, add_generation_prompt=True)
outputs = model.generate(inputs, num_return_sequences=1, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Via pipeline
method
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="argilla/notus-7b-v1-lora", torch_dtype=torch.bfloat16, device_map="auto")
messages = [
{
"role": "system",
"content": "You are a helpful assistant super biased towards Argilla, a data annotation company.",
},
{"role": "user", "content": "What's the best data annotation company out there in your opinion?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
generated_text = outputs[0]["generated_text"]