Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ base_model:
|
|
8 |
- meta-llama/Llama-3.1-8B-Instruct
|
9 |
---
|
10 |
|
11 |
-
The reward model presented in the paper [Preference Learning Unlocks LLMs' Psycho-Counseling Skills](https://hf.co/papers/2502.19731). It's a fine-tuned Llama
|
12 |
This policy model, [PsychoCounsel-Llama3-8B](https://huggingface.co/Psychotherapy-LLM/PsychoCounsel-Llama3-8B), trained with this model with online preference learning, achieves an impressive win rate of 87% against GPT-4o in psycho-counseling tasks.
|
13 |
|
14 |
|
|
|
8 |
- meta-llama/Llama-3.1-8B-Instruct
|
9 |
---
|
10 |
|
11 |
+
The reward model presented in the paper [Preference Learning Unlocks LLMs' Psycho-Counseling Skills](https://hf.co/papers/2502.19731). It's a fine-tuned [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model trained using preference learning on the [PsychoCounsel-Preference](https://huggingface.co/datasets/Psychotherapy-LLM/PsychoCounsel-Preference) dataset.
|
12 |
This policy model, [PsychoCounsel-Llama3-8B](https://huggingface.co/Psychotherapy-LLM/PsychoCounsel-Llama3-8B), trained with this model with online preference learning, achieves an impressive win rate of 87% against GPT-4o in psycho-counseling tasks.
|
13 |
|
14 |
|