vincentmin
commited on
Commit
·
f0db376
1
Parent(s):
9877e68
Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,9 @@ It achieves the following results on the evaluation set:
|
|
20 |
- Loss: 0.5713
|
21 |
- Accuracy: 0.7435
|
22 |
|
|
|
|
|
|
|
23 |
## Model description
|
24 |
|
25 |
This is a reward model trained with QLoRA in 4bit precision. The base model is [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) for which you need to have accepted the license in order to be able use it. Once you've been given permission, you can load the reward model as follows:
|
|
|
20 |
- Loss: 0.5713
|
21 |
- Accuracy: 0.7435
|
22 |
|
23 |
+
See also [vincentmin/llama-2-13b-reward-oasst1](https://huggingface.co/vincentmin/llama-2-13b-reward-oasst1) for a 13b version of this model.
|
24 |
+
|
25 |
+
|
26 |
## Model description
|
27 |
|
28 |
This is a reward model trained with QLoRA in 4bit precision. The base model is [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) for which you need to have accepted the license in order to be able use it. Once you've been given permission, you can load the reward model as follows:
|