Update README.md
Browse files
README.md
CHANGED
@@ -131,7 +131,7 @@ The training was conducted on the NVIDIA DGX cluster with H100 GPUs, utilizing P
|
|
131 |
|
132 |
### Open-ended question generation
|
133 |
|
134 |
-
To ensure a robust evaluation of our model's output quality, we employ the LLM-as-a-Judge approach using Prometheus-8x7b-v2.0. Our assessment uses carefully curated 4,000 publicly accessible healthcare-related questions, generating responses from various models. We then use Prometheus to conduct pairwise comparisons of the answers. Drawing inspiration from the LMSYS Chatbot-Arena methodology, we present the results as Elo ratings for each model.
|
135 |
|
136 |
To maintain fairness and eliminate potential bias from prompt engineering, we used the same simple system prompt for every model throughout the evaluation process.
|
137 |
|
|
|
131 |
|
132 |
### Open-ended question generation
|
133 |
|
134 |
+
To ensure a robust evaluation of our model's output quality, we employ the LLM-as-a-Judge approach using Prometheus-8x7b-v2.0. Our assessment uses carefully curated 4,000 publicly accessible healthcare-related questions, generating responses from various models using the same prompt. We then use Prometheus to conduct pairwise comparisons of the answers. Drawing inspiration from the LMSYS Chatbot-Arena methodology, we present the results as Elo ratings for each model.
|
135 |
|
136 |
To maintain fairness and eliminate potential bias from prompt engineering, we used the same simple system prompt for every model throughout the evaluation process.
|
137 |
|