Commit
·
4f94bf4
1
Parent(s):
37a71b7
Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,8 @@ It achieves the following results on the evaluation set:
|
|
22 |
- Loss: 0.4810
|
23 |
- Accuracy: 0.7869
|
24 |
|
|
|
|
|
25 |
## Model description
|
26 |
|
27 |
This is a reward model trained with QLoRA in 4bit precision. The base model is [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) for which you need to have accepted the license in order to be able use it. Once you've been given permission, you can load the reward model as follows:
|
@@ -30,7 +32,7 @@ import torch
|
|
30 |
from peft import PeftModel, PeftConfig
|
31 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
32 |
|
33 |
-
peft_model_id = "vincentmin/llama-2-
|
34 |
config = PeftConfig.from_pretrained(peft_model_id)
|
35 |
model = AutoModelForSequenceClassification.from_pretrained(
|
36 |
config.base_model_name_or_path,
|
|
|
22 |
- Loss: 0.4810
|
23 |
- Accuracy: 0.7869
|
24 |
|
25 |
+
See also [vincentmin/llama-2-7b-reward-oasst1](https://huggingface.co/vincentmin/llama-2-13b-reward-oasst1) for a 7b version of this model.
|
26 |
+
|
27 |
## Model description
|
28 |
|
29 |
This is a reward model trained with QLoRA in 4bit precision. The base model is [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) for which you need to have accepted the license in order to be able use it. Once you've been given permission, you can load the reward model as follows:
|
|
|
32 |
from peft import PeftModel, PeftConfig
|
33 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
34 |
|
35 |
+
peft_model_id = "vincentmin/llama-2-13b-reward-oasst1"
|
36 |
config = PeftConfig.from_pretrained(peft_model_id)
|
37 |
model = AutoModelForSequenceClassification.from_pretrained(
|
38 |
config.base_model_name_or_path,
|