Commit
·
718ef20
1
Parent(s):
aac7e40
Update README.md
Browse files
README.md
CHANGED
@@ -2,7 +2,6 @@
|
|
2 |
model-index:
|
3 |
- name: notus-7b-dpo-lora
|
4 |
results: []
|
5 |
-
license: mit
|
6 |
datasets:
|
7 |
- argilla/ultrafeedback-binarized-avg-rating-for-dpo
|
8 |
language:
|
@@ -10,36 +9,37 @@ language:
|
|
10 |
base_model: alignment-handbook/zephyr-7b-sft-full
|
11 |
library_name: transformers
|
12 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
-
# Model Card for
|
16 |
-
|
17 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
18 |
-
|
19 |
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
## Model Details
|
22 |
|
23 |
### Model Description
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
- **
|
30 |
-
- **
|
31 |
-
- **Model type:** [More Information Needed]
|
32 |
-
- **Language(s) (NLP):** [More Information Needed]
|
33 |
-
- **License:** [More Information Needed]
|
34 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
35 |
|
36 |
### Model Sources [optional]
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
- **
|
41 |
-
- **Paper [optional]:** [More Information Needed]
|
42 |
-
- **Demo [optional]:** [More Information Needed]
|
43 |
|
44 |
## Uses
|
45 |
|
@@ -139,26 +139,7 @@ Use the code below to get started with the model.
|
|
139 |
#### Summary
|
140 |
|
141 |
|
142 |
-
|
143 |
-
## Model Examination [optional]
|
144 |
-
|
145 |
-
<!-- Relevant interpretability work for the model goes here -->
|
146 |
-
|
147 |
-
[More Information Needed]
|
148 |
-
|
149 |
-
## Environmental Impact
|
150 |
-
|
151 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
152 |
-
|
153 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
154 |
-
|
155 |
-
- **Hardware Type:** [More Information Needed]
|
156 |
-
- **Hours used:** [More Information Needed]
|
157 |
-
- **Cloud Provider:** [More Information Needed]
|
158 |
-
- **Compute Region:** [More Information Needed]
|
159 |
-
- **Carbon Emitted:** [More Information Needed]
|
160 |
-
|
161 |
-
## Technical Specifications [optional]
|
162 |
|
163 |
### Model Architecture and Objective
|
164 |
|
@@ -170,7 +151,7 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
170 |
|
171 |
#### Hardware
|
172 |
|
173 |
-
|
174 |
|
175 |
#### Software
|
176 |
|
@@ -206,11 +187,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
206 |
|
207 |
[More Information Needed]
|
208 |
|
209 |
-
|
210 |
## Training procedure
|
211 |
-
|
212 |
-
|
213 |
-
### Framework versions
|
214 |
-
|
215 |
-
|
216 |
-
- PEFT 0.6.1
|
|
|
2 |
model-index:
|
3 |
- name: notus-7b-dpo-lora
|
4 |
results: []
|
|
|
5 |
datasets:
|
6 |
- argilla/ultrafeedback-binarized-avg-rating-for-dpo
|
7 |
language:
|
|
|
9 |
base_model: alignment-handbook/zephyr-7b-sft-full
|
10 |
library_name: transformers
|
11 |
pipeline_tag: text-generation
|
12 |
+
tags:
|
13 |
+
- dpo
|
14 |
+
- preference
|
15 |
+
- ultrafeedback
|
16 |
+
license: apache-2.0
|
17 |
---
|
18 |
|
19 |
+
# Model Card for Notus 7B
|
|
|
|
|
|
|
20 |
|
21 |
+
Notus is going to be a collection of fine-tuned models using DPO, similarly to Zephyr, but mainly focused
|
22 |
+
on the Direct Preference Optimization (DPO) step, aiming to incorporate preference feedback into the LLMs
|
23 |
+
when fine-tuning those. Notus models are intended to be used as assistants via chat-like applications, and
|
24 |
+
are evaluated with the MT-Bench and AlpacaEval benchmarks, to be directly compared with Zephyr fine-tuned models
|
25 |
+
also using DPO.
|
26 |
|
27 |
## Model Details
|
28 |
|
29 |
### Model Description
|
30 |
|
31 |
+
- **Developed by:** Argilla, Inc. (based on HuggingFace H4 and MistralAI previous efforts and amazing work)
|
32 |
+
- **Shared by:** Argilla, Inc.
|
33 |
+
- **Model type:** GPT-like 7B model DPO fine-tuned using LoRA
|
34 |
+
- **Language(s) (NLP):** Mainly English
|
35 |
+
- **License:** Apache 2.0 (same as Zephyr 7B SFT and Mistral 7B v0.1)
|
36 |
+
- **Finetuned from model:** [`alignment-handbook/zephyr-7b-sft-full`](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full)
|
|
|
|
|
|
|
|
|
37 |
|
38 |
### Model Sources [optional]
|
39 |
|
40 |
+
- **Repository:** https://github.com/argilla-io/notus-7b-dpo
|
41 |
+
- **Paper:** N/A
|
42 |
+
- **Demo:** https://argilla-notus-chat-ui.hf.space/
|
|
|
|
|
43 |
|
44 |
## Uses
|
45 |
|
|
|
139 |
#### Summary
|
140 |
|
141 |
|
142 |
+
## Technical Specifications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
143 |
|
144 |
### Model Architecture and Objective
|
145 |
|
|
|
151 |
|
152 |
#### Hardware
|
153 |
|
154 |
+
8 x A100 40GB
|
155 |
|
156 |
#### Software
|
157 |
|
|
|
187 |
|
188 |
[More Information Needed]
|
189 |
|
|
|
190 |
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|