--- model-index: - name: notus-7b-dpo-lora results: [] datasets: - argilla/ultrafeedback-binarized-avg-rating-for-dpo language: - en base_model: alignment-handbook/zephyr-7b-sft-full library_name: transformers pipeline_tag: text-generation tags: - dpo - preference - ultrafeedback license: apache-2.0 --- # Model Card for Notus 7B Notus is going to be a collection of fine-tuned models using DPO, similarly to Zephyr, but mainly focused on the Direct Preference Optimization (DPO) step, aiming to incorporate preference feedback into the LLMs when fine-tuning those. Notus models are intended to be used as assistants via chat-like applications, and are evaluated with the MT-Bench and AlpacaEval benchmarks, to be directly compared with Zephyr fine-tuned models also using DPO. ## Model Details ### Model Description - **Developed by:** Argilla, Inc. (based on HuggingFace H4 and MistralAI previous efforts and amazing work) - **Shared by:** Argilla, Inc. - **Model type:** GPT-like 7B model DPO fine-tuned using LoRA - **Language(s) (NLP):** Mainly English - **License:** Apache 2.0 (same as Zephyr 7B SFT and Mistral 7B v0.1) - **Finetuned from model:** [`alignment-handbook/zephyr-7b-sft-full`](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) ### Model Sources [optional] - **Repository:** https://github.com/argilla-io/notus-7b-dpo - **Paper:** N/A - **Demo:** https://argilla-notus-chat-ui.hf.space/ ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Technical Specifications ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware 8 x A100 40GB #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed] ## Training procedure