aquibmoin commited on
Commit
446c9f7
·
verified ·
1 Parent(s): bdd0699

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +8 -8
app.py CHANGED
@@ -83,14 +83,14 @@ def interpret_ragas_results_with_gpt(formatted_scores: list, llm) -> str:
83
  prompt = f"""
84
  You are an expert in RAGAS evaluation metrics to evaluate AI-generated content.
85
 
86
- The following RAGAS evaluation scores are from a comparison between an AI-generated scientific case development document (SCDD) and a human-written version. This evaluation is conducted in the context of exploratory and novel scientific use cases — not strict academic summaries. The AI-generated document may include new ideas, restructured concepts, or facts not explicitly mentioned in the human reference.
87
-
88
- When interpreting the metrics, adopt a constructive and exploratory perspective. In particular:
89
-
90
- - **Lower factual correctness or accuracy scores or response groundedness scores** do not necessarily indicate factual errors. They may reflect the presence of new, valid information introduced by the AI that isn’t present in the human document.
91
- - **Semantic similarity** and **faithfulness** may vary due to phrasing, abstraction, or granularity, and should be considered within the context of novelty and creativity.
92
- - AI-generated document may be identifying gaps or elements missing from the human reference.
93
- - Interpret each score clearly, explaining both strengths and areas where alignment may differ, without penalizing innovation or deeper insight.
94
 
95
  RAGAS Scores:
96
  {score_text}
 
83
  prompt = f"""
84
  You are an expert in RAGAS evaluation metrics to evaluate AI-generated content.
85
 
86
+ The following RAGAS evaluation scores are from a comparison between an AI-generated scientific case development document (SCDD) and a human-written version. This evaluation is conducted in the context of exploratory and novel scientific use cases. The AI-generated document may include new ideas, restructured concepts, or facts not explicitly mentioned in the human reference.
87
+
88
+ Interpret these scores with a **balanced and critical lens**:
89
+ - Acknowledge that the AI output may contain exploratory and novel content.
90
+ - However, evaluate the scores in light of both strengths and **potential weaknesses** or limitations.
91
+ - Consider how novelty, phrasing differences, or omissions might impact factual and alignment-based metrics.
92
+ - Cover both aspects of novelty. That is, new insights as well as inaccuracies.
93
+ - Do **not** start with phrases like "Certainly" or "Here's..." **begin directly with the interpretation**.
94
 
95
  RAGAS Scores:
96
  {score_text}