Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
@@ -83,14 +83,14 @@ def interpret_ragas_results_with_gpt(formatted_scores: list, llm) -> str:
|
|
83 |
prompt = f"""
|
84 |
You are an expert in RAGAS evaluation metrics to evaluate AI-generated content.
|
85 |
|
86 |
-
The following RAGAS evaluation scores are from a comparison between an AI-generated scientific case development document (SCDD) and a human-written version. This evaluation is conducted in the context of exploratory and novel scientific use cases
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
-
|
91 |
-
-
|
92 |
-
-
|
93 |
-
-
|
94 |
|
95 |
RAGAS Scores:
|
96 |
{score_text}
|
|
|
83 |
prompt = f"""
|
84 |
You are an expert in RAGAS evaluation metrics to evaluate AI-generated content.
|
85 |
|
86 |
+
The following RAGAS evaluation scores are from a comparison between an AI-generated scientific case development document (SCDD) and a human-written version. This evaluation is conducted in the context of exploratory and novel scientific use cases. The AI-generated document may include new ideas, restructured concepts, or facts not explicitly mentioned in the human reference.
|
87 |
+
|
88 |
+
Interpret these scores with a **balanced and critical lens**:
|
89 |
+
- Acknowledge that the AI output may contain exploratory and novel content.
|
90 |
+
- However, evaluate the scores in light of both strengths and **potential weaknesses** or limitations.
|
91 |
+
- Consider how novelty, phrasing differences, or omissions might impact factual and alignment-based metrics.
|
92 |
+
- Cover both aspects of novelty. That is, new insights as well as inaccuracies.
|
93 |
+
- Do **not** start with phrases like "Certainly" or "Here's..." — **begin directly with the interpretation**.
|
94 |
|
95 |
RAGAS Scores:
|
96 |
{score_text}
|