Cameron-Chen
commited on
Commit
•
4f82cba
1
Parent(s):
5ae3103
Update README.md
Browse files
README.md
CHANGED
@@ -9,9 +9,9 @@ language:
|
|
9 |
|
10 |
<!-- This is a model released from the preprint: *[Bootstrapping Language Models with DPO Implicit Rewards](https://arxiv.org/abs/2406.09760)*. Please refer to our [repository](https://github.com/sail-sg/dice) for more details. -->
|
11 |
|
12 |
-
# Llama-3-Base-8B-DICE-
|
13 |
|
14 |
-
This model was developed using [Bootstrapping Language Models with DPO Implicit Rewards](https://arxiv.org/abs/2406.09760) (DICE) at iteration
|
15 |
|
16 |
<!-- We utilized the prompt sets extracted from [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized). -->
|
17 |
|
|
|
9 |
|
10 |
<!-- This is a model released from the preprint: *[Bootstrapping Language Models with DPO Implicit Rewards](https://arxiv.org/abs/2406.09760)*. Please refer to our [repository](https://github.com/sail-sg/dice) for more details. -->
|
11 |
|
12 |
+
# Llama-3-Base-8B-DICE-Iter1
|
13 |
|
14 |
+
This model was developed using [Bootstrapping Language Models with DPO Implicit Rewards](https://arxiv.org/abs/2406.09760) (DICE) at iteration 1, based on the [princeton-nlp/Llama-3-Base-8B-SFT-DPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-DPO) architecture as the starting point.
|
15 |
|
16 |
<!-- We utilized the prompt sets extracted from [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized). -->
|
17 |
|