Upload folder using huggingface_hub (#7)

- f4295322ed858a8fbf38688ef4f66fe4b5bc05fb7f546ae9264a472e5239f5d2 (34671fc2876506de8f6f20ea53f0bab25d591067)
- 8c9448225b6d39c35ae97c0ceffcf19f863386bf3080d08731fe757bf70fb1aa (80ea39f4383c81e642305c49d484f6564e665421)
- 54e3a9aa87dd59f13edf74bba2b3c2927cd0d30b14a35927e0492aa33f641ffa (012fd560c29ff6c4fdb977209db5f0dfc457424f)
- 9e84c48d007060e25931bf9b53241145366debd7948b69fa07bc26a73193e3e9 (34e3a0befee318ed879a2681ebd8912d2f648cf4)
- a0d6ad42d56fbe1cde7c72a9d567995eec85083a6d3fb172fd68eb8c2285878e (b4d9c6ab2c50523eee3a47044025f3d19cf8076e)
- b1744ae213ec8db6a8f4096d511a09be14bfbb99e690c06d10a0e4f560319a3b (313a325ce0a1d7fc46cc27d6f118ff6ebb4453ed)
- e2f8ff30e35595fc88c47f4940b7287a57f57b3172542305244b7551db1dda8a (42c8b2c97ec6bcdf07e1a8a239296b8acf76ef8f)
- 6fd424449d2a16ad785e7289c4a376277b3c94c6a13af7c05c8c78bd00507d2f (dcf8c5403e53a344c6ef0824632a6514344434e7)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -33,6 +33,7 @@ with torch.no_grad():
     # reward: 0.76
 ```
 模型可以较为准确地判断文本重复，异常中断和不符合指令要求等低质量模型生成结果，并给出较低的奖励值。
 The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
 ```python
@@ -52,8 +53,11 @@ with torch.no_grad():
     print(reward.tolist())
     #reward: [0.76, -1.36, -2.99, -1.82]
 ```
 模型能够对比对同一指令的不同生成结果，并根据质量给出奖励值。
 The model is able to compare different generation results for the same instruction and give reward values based on quality.
 ```python
 prefix_user = "Human:"
 prefix_bot = "\n\nAssistant:"

     # reward: 0.76
 ```
 模型可以较为准确地判断文本重复，异常中断和不符合指令要求等低质量模型生成结果，并给出较低的奖励值。
 The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
 ```python
     print(reward.tolist())
     #reward: [0.76, -1.36, -2.99, -1.82]
 ```
 模型能够对比对同一指令的不同生成结果，并根据质量给出奖励值。
 The model is able to compare different generation results for the same instruction and give reward values based on quality.
 ```python
 prefix_user = "Human:"
 prefix_bot = "\n\nAssistant:"