Psychotherapy-LLM
/

PsychoCounsel-Llama3-8B-Reward

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

billmianz commited on 5 days ago

Commit

e7def3e

·

verified ·

1 Parent(s): 6ca9fcc

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ base_model:
 - meta-llama/Llama-3.1-8B-Instruct
 ---
-The reward model presented in the paper [Preference Learning Unlocks LLMs' Psycho-Counseling Skills](https://hf.co/papers/2502.19731).  It's a fine-tuned Llama 3 model trained using preference learning on the [PsychoCounsel-Preference](https://huggingface.co/datasets/Psychotherapy-LLM/PsychoCounsel-Preference) dataset.
 This policy model, [PsychoCounsel-Llama3-8B](https://huggingface.co/Psychotherapy-LLM/PsychoCounsel-Llama3-8B), trained with this model with online preference learning, achieves an impressive win rate of 87% against GPT-4o in psycho-counseling tasks.

 - meta-llama/Llama-3.1-8B-Instruct
 ---
+The reward model presented in the paper [Preference Learning Unlocks LLMs' Psycho-Counseling Skills](https://hf.co/papers/2502.19731).  It's a fine-tuned [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model trained using preference learning on the [PsychoCounsel-Preference](https://huggingface.co/datasets/Psychotherapy-LLM/PsychoCounsel-Preference) dataset.
 This policy model, [PsychoCounsel-Llama3-8B](https://huggingface.co/Psychotherapy-LLM/PsychoCounsel-Llama3-8B), trained with this model with online preference learning, achieves an impressive win rate of 87% against GPT-4o in psycho-counseling tasks.