Cyrile commited on
Commit
28765ca
1 Parent(s): 92b2f25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -40,7 +40,7 @@ essential.
40
 
41
  The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset, which includes:
42
 
43
- **Human Preference Data:**
44
  - **Description:** Annotations of helpfulness and harmlessness, with each entry containing "chosen" and "rejected" text pairs.
45
  - **Purpose:** To train preference models for Reinforcement Learning from Human Feedback (RLHF), not for supervised training of dialogue agents.
46
  - **Source:** Data from context-distilled language models, rejection sampling, and an iterated online process.
 
40
 
41
  The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset, which includes:
42
 
43
+ Human Preference Data:
44
  - **Description:** Annotations of helpfulness and harmlessness, with each entry containing "chosen" and "rejected" text pairs.
45
  - **Purpose:** To train preference models for Reinforcement Learning from Human Feedback (RLHF), not for supervised training of dialogue agents.
46
  - **Source:** Data from context-distilled language models, rejection sampling, and an iterated online process.