vincentmin commited on
Commit
4f94bf4
·
1 Parent(s): 37a71b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -22,6 +22,8 @@ It achieves the following results on the evaluation set:
22
  - Loss: 0.4810
23
  - Accuracy: 0.7869
24
 
 
 
25
  ## Model description
26
 
27
  This is a reward model trained with QLoRA in 4bit precision. The base model is [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) for which you need to have accepted the license in order to be able use it. Once you've been given permission, you can load the reward model as follows:
@@ -30,7 +32,7 @@ import torch
30
  from peft import PeftModel, PeftConfig
31
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
32
 
33
- peft_model_id = "vincentmin/llama-2-7b-reward-oasst1"
34
  config = PeftConfig.from_pretrained(peft_model_id)
35
  model = AutoModelForSequenceClassification.from_pretrained(
36
  config.base_model_name_or_path,
 
22
  - Loss: 0.4810
23
  - Accuracy: 0.7869
24
 
25
+ See also [vincentmin/llama-2-7b-reward-oasst1](https://huggingface.co/vincentmin/llama-2-13b-reward-oasst1) for a 7b version of this model.
26
+
27
  ## Model description
28
 
29
  This is a reward model trained with QLoRA in 4bit precision. The base model is [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) for which you need to have accepted the license in order to be able use it. Once you've been given permission, you can load the reward model as follows:
 
32
  from peft import PeftModel, PeftConfig
33
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
34
 
35
+ peft_model_id = "vincentmin/llama-2-13b-reward-oasst1"
36
  config = PeftConfig.from_pretrained(peft_model_id)
37
  model = AutoModelForSequenceClassification.from_pretrained(
38
  config.base_model_name_or_path,