Spaces:

aquibmoin
/

AI-SciDoc-Evaluator

Sleeping

aquibmoin commited on Jul 23

Commit

446c9f7

verified ·

1 Parent(s): bdd0699

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -83,14 +83,14 @@ def interpret_ragas_results_with_gpt(formatted_scores: list, llm) -> str:
     prompt = f"""
 You are an expert in RAGAS evaluation metrics to evaluate AI-generated content.
-The following RAGAS evaluation scores are from a comparison between an AI-generated scientific case development document (SCDD) and a human-written version. This evaluation is conducted in the context of exploratory and novel scientific use cases — not strict academic summaries. The AI-generated document may include new ideas, restructured concepts, or facts not explicitly mentioned in the human reference.
-When interpreting the metrics, adopt a constructive and exploratory perspective. In particular:
-- **Lower factual correctness or accuracy scores or response groundedness scores** do not necessarily indicate factual errors. They may reflect the presence of new, valid information introduced by the AI that isn’t present in the human document.
-- **Semantic similarity** and **faithfulness** may vary due to phrasing, abstraction, or granularity, and should be considered within the context of novelty and creativity.
-- AI-generated document may be identifying gaps or elements missing from the human reference.
-- Interpret each score clearly, explaining both strengths and areas where alignment may differ, without penalizing innovation or deeper insight.
 RAGAS Scores:
 {score_text}

     prompt = f"""
 You are an expert in RAGAS evaluation metrics to evaluate AI-generated content.
+The following RAGAS evaluation scores are from a comparison between an AI-generated scientific case development document (SCDD) and a human-written version. This evaluation is conducted in the context of exploratory and novel scientific use cases. The AI-generated document may include new ideas, restructured concepts, or facts not explicitly mentioned in the human reference.
+Interpret these scores with a **balanced and critical lens**:
+- Acknowledge that the AI output may contain exploratory and novel content.
+- However, evaluate the scores in light of both strengths and **potential weaknesses** or limitations.
+- Consider how novelty, phrasing differences, or omissions might impact factual and alignment-based metrics.
+- Cover both aspects of novelty. That is, new insights as well as inaccuracies.
+- Do **not** start with phrases like "Certainly" or "Here's..." — **begin directly with the interpretation**.
 RAGAS Scores:
 {score_text}