Upload folder using huggingface_hub (#7)
Browse files- f4295322ed858a8fbf38688ef4f66fe4b5bc05fb7f546ae9264a472e5239f5d2 (34671fc2876506de8f6f20ea53f0bab25d591067)
- 8c9448225b6d39c35ae97c0ceffcf19f863386bf3080d08731fe757bf70fb1aa (80ea39f4383c81e642305c49d484f6564e665421)
- 54e3a9aa87dd59f13edf74bba2b3c2927cd0d30b14a35927e0492aa33f641ffa (012fd560c29ff6c4fdb977209db5f0dfc457424f)
- 9e84c48d007060e25931bf9b53241145366debd7948b69fa07bc26a73193e3e9 (34e3a0befee318ed879a2681ebd8912d2f648cf4)
- a0d6ad42d56fbe1cde7c72a9d567995eec85083a6d3fb172fd68eb8c2285878e (b4d9c6ab2c50523eee3a47044025f3d19cf8076e)
- b1744ae213ec8db6a8f4096d511a09be14bfbb99e690c06d10a0e4f560319a3b (313a325ce0a1d7fc46cc27d6f118ff6ebb4453ed)
- e2f8ff30e35595fc88c47f4940b7287a57f57b3172542305244b7551db1dda8a (42c8b2c97ec6bcdf07e1a8a239296b8acf76ef8f)
- 6fd424449d2a16ad785e7289c4a376277b3c94c6a13af7c05c8c78bd00507d2f (dcf8c5403e53a344c6ef0824632a6514344434e7)
@@ -33,6 +33,7 @@ with torch.no_grad():
|
|
33 |
# reward: 0.76
|
34 |
```
|
35 |
模型可以较为准确地判断文本重复,异常中断和不符合指令要求等低质量模型生成结果,并给出较低的奖励值。
|
|
|
36 |
The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
|
37 |
|
38 |
```python
|
@@ -52,8 +53,11 @@ with torch.no_grad():
|
|
52 |
print(reward.tolist())
|
53 |
#reward: [0.76, -1.36, -2.99, -1.82]
|
54 |
```
|
|
|
55 |
模型能够对比对同一指令的不同生成结果,并根据质量给出奖励值。
|
|
|
56 |
The model is able to compare different generation results for the same instruction and give reward values based on quality.
|
|
|
57 |
```python
|
58 |
prefix_user = "Human:"
|
59 |
prefix_bot = "\n\nAssistant:"
|
|
|
33 |
# reward: 0.76
|
34 |
```
|
35 |
模型可以较为准确地判断文本重复,异常中断和不符合指令要求等低质量模型生成结果,并给出较低的奖励值。
|
36 |
+
|
37 |
The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
|
38 |
|
39 |
```python
|
|
|
53 |
print(reward.tolist())
|
54 |
#reward: [0.76, -1.36, -2.99, -1.82]
|
55 |
```
|
56 |
+
|
57 |
模型能够对比对同一指令的不同生成结果,并根据质量给出奖励值。
|
58 |
+
|
59 |
The model is able to compare different generation results for the same instruction and give reward values based on quality.
|
60 |
+
|
61 |
```python
|
62 |
prefix_user = "Human:"
|
63 |
prefix_bot = "\n\nAssistant:"
|