Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ tags: []
|
|
5 |
|
6 |
## Description
|
7 |
|
8 |
-
Gemma-2-9b-it model finetuned by off-
|
9 |
|
10 |
## License
|
11 |
This model is licensed under the Zoom software license and is permitted for use only for noncommercial, educational, or academic research purposes.
|
|
|
5 |
|
6 |
## Description
|
7 |
|
8 |
+
Gemma-2-9b-it model finetuned by off-policy WPO. Details in [WPO: Enhancing RLHF with Weighted Preference Optimization](https://arxiv.org/abs/2406.11827).
|
9 |
|
10 |
## License
|
11 |
This model is licensed under the Zoom software license and is permitted for use only for noncommercial, educational, or academic research purposes.
|