Spaces:

aquibmoin
/

AI-SciDoc-Evaluator

Sleeping

App Files Files Community

aquibmoin commited on Jul 23

Commit

bdd0699

verified ·

1 Parent(s): 3bf3829

Update app.py

Browse files

Files changed (1) hide show

app.py +14 -5

app.py CHANGED Viewed

@@ -81,12 +81,21 @@ def interpret_ragas_results_with_gpt(formatted_scores: list, llm) -> str:
     score_text = "\n".join([f"{k}: {v}" for k, v in formatted_scores[0].items()])
     prompt = f"""
-You are an expert in RAGAS evaluation metrics to evaluate AI-generated content. Based on the following RAGAS evaluation scores, provide a concise interpretation of each of the metric for the evaluation of AI-generated text. Write in a professional, clear, and objective tone.
 RAGAS Scores:
 {score_text}
-Provide a paragraph-style interpretation.
 """
     response = llm.invoke(prompt)
@@ -107,7 +116,7 @@ def generate_word_report(science_goal, ragas_results, radar_chart_path, interpre
     doc.add_heading("RAGAS Metrics Chart", level=1)
     doc.add_picture(radar_chart_path, width=Inches(5))
-    doc.add_heading("GPT Interpretation", level=1)
     doc.add_paragraph(interpretation)
     output_path = "SCDD_Evaluation_Report.docx"
@@ -169,9 +178,9 @@ interface = gr.Interface(
         gr.Textbox(label="Science Goal", placeholder="Enter science goal here..."),
     ],
     outputs=[
-        gr.JSON(label="RAGAS Scores"),
         gr.Image(label="RAGAS Metrics Radar Chart"),
-        gr.Textbox(label="GPT Interpretation of RAGAS Results"),
         gr.File(label="Download Word Report")
     ],
     title="RAGAS Evaluation: AI vs Human SCDD",

     score_text = "\n".join([f"{k}: {v}" for k, v in formatted_scores[0].items()])
     prompt = f"""
+You are an expert in RAGAS evaluation metrics to evaluate AI-generated content.
+The following RAGAS evaluation scores are from a comparison between an AI-generated scientific case development document (SCDD) and a human-written version. This evaluation is conducted in the context of exploratory and novel scientific use cases — not strict academic summaries. The AI-generated document may include new ideas, restructured concepts, or facts not explicitly mentioned in the human reference.
+When interpreting the metrics, adopt a constructive and exploratory perspective. In particular:
+- **Lower factual correctness or accuracy scores or response groundedness scores** do not necessarily indicate factual errors. They may reflect the presence of new, valid information introduced by the AI that isn’t present in the human document.
+- **Semantic similarity** and **faithfulness** may vary due to phrasing, abstraction, or granularity, and should be considered within the context of novelty and creativity.
+- AI-generated document may be identifying gaps or elements missing from the human reference.
+- Interpret each score clearly, explaining both strengths and areas where alignment may differ, without penalizing innovation or deeper insight.
 RAGAS Scores:
 {score_text}
+Provide a short paragraph interpretation for each metric.
 """
     response = llm.invoke(prompt)
     doc.add_heading("RAGAS Metrics Chart", level=1)
     doc.add_picture(radar_chart_path, width=Inches(5))
+    doc.add_heading("GPT-4.1 Interpretation of RAGAS AI-SCDD Evaluation", level=1)
     doc.add_paragraph(interpretation)
     output_path = "SCDD_Evaluation_Report.docx"
         gr.Textbox(label="Science Goal", placeholder="Enter science goal here..."),
     ],
     outputs=[
+        gr.JSON(label="RAGAS Evaluation Scores"),
         gr.Image(label="RAGAS Metrics Radar Chart"),
+        gr.Textbox(label="GPT-4.1 Interpretation of RAGAS Results"),
         gr.File(label="Download Word Report")
     ],
     title="RAGAS Evaluation: AI vs Human SCDD",