Spaces:

lightmate
/

EchoTruth

Sleeping

App Files Files Community

lightmate commited on Jan 2

Commit

81f2695

verified ·

1 Parent(s): 1190b8e

added config in readme

Browse files

Files changed (1) hide show

README.md +113 -101

README.md CHANGED Viewed

@@ -1,102 +1,114 @@
-### **Implementation Steps: Validating Information with Context**
-Validating the accuracy or degree of truthfulness of a given piece of information requires **context**—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step:
----
-### **Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search**
-Instead of relying on a **traditional Knowledge Graph (KG)**, we use **FAISS (Facebook AI Similarity Search)**, a **faster, scalable, and flexible alternative** for semantic search.
-#### **Why FAISS is Better than a Traditional KG**
-1. **Sentence-Level Retrieval**: Unlike traditional KGs that often rely on pre-defined **entities and relationships**, FAISS uses dense **embeddings** to directly match the **semantic meaning** of entire sentences.
-2. **Scalable and High-Speed Retrieval**: FAISS efficiently handles **millions of embeddings**, making it highly scalable for real-world applications.
-3. **Flexibility**: It works with **unstructured text**, removing the need to pre-process information into entities and relations, which is often time-consuming.
-4. **Generalization**: FAISS enables **approximate nearest neighbor (ANN) search**, allowing retrieval of contextually related results, even if they are not exact matches.
-#### **Dataset Used**
-We leverage the **News Category Dataset** ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains **news headlines and short descriptions** across various categories.
-- **Why This Dataset?**
-  It covers a **wide range of topics**, making it useful for general-purpose context building.
-  - Headlines and descriptions provide **rich semantic embeddings** for similarity searches.
-  - Categories allow filtering relevant results if required (e.g., "science" or "technology").
-**Process:**
-1. We use **SentenceTransformer (all-MiniLM-L6-v2)** to generate embeddings for the query (the input news).
-2. We search against pre-computed embeddings stored in a **FAISS index** to retrieve the **top-K most relevant entries**.
-3. These results form the **initial context**, capturing related information already present in the dataset.
----
-### **Step 2: Online Search for Real-Time Context**
-To **augment** the context retrieved from FAISS, we incorporate **real-time online search** using an API.
-#### **Why Online Search is Critical?**
-- **Fresh Information**: News and facts evolve, especially in areas like **science, technology, or politics**. Online search ensures access to the **latest updates** that may not exist in the static dataset.
-- **Diverse Sources**: It broadens the scope by pulling information from **multiple credible sources**, reducing bias and enhancing reliability.
-- **Fact-Checking**: Search engines often index **trusted fact-checking websites** that we can incorporate into the context.
-**Process:**
-1. Use an API with a **search query** derived from the input news.
-2. Retrieve relevant snippets, headlines, or summaries.
-3. Append these results to the **context** built using FAISS.
----
-### **Step 3: Building Context from Combined Sources**
-Both FAISS-based retrieval and **online search results** are combined into a **single context string**. This provides a **comprehensive knowledge base** around the input information.
-- **Why Combine Both?**
-  - FAISS offers **pre-indexed knowledge**—ideal for **static facts** or concepts.
-  - Online search complements it with **dynamic and up-to-date insights**—perfect for verifying **recent developments**.
-This layered context improves the model’s ability to assess the **truthfulness** of the given information.
----
-### **Step 4: Truthfulness Prediction with Zero-Shot Classification Model**
-We use the **Facebook/BART-Large-MNLI** model, a **zero-shot classification** model, for evaluation.
-#### **Why BART-Large-MNLI?**
-1. **Zero-Shot Capability**: It can handle claims and hypotheses without needing **task-specific training**—perfect for this flexible, multi-domain use case.
-2. **Contextual Matching**: It compares the input claim (news) with the constructed context to assess **semantic consistency**.
-3. **High Accuracy**: Pre-trained on **natural language inference tasks**, making it adept at understanding relationships like **entailment** and **contradiction**.
-4. **Multi-Label Support**: Can evaluate multiple labels simultaneously, ideal for **degrees of truthfulness**.
-**Process:**
-1. Input the **news** as the claim and the **context** as the hypothesis.
-2. Compute a **truthfulness score** between **0 and 1**, where:
-   - **0**: Completely **false**.
-   - **1**: Completely **true**.
-3. Generate **explanations** based on the score and suggest actions (e.g., further verification if uncertain).
----
-### **End-to-End Example**
-**Input News:**
-"Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
-**Context Built:**
-- **FAISS Search:** Finds prior research on **quantum time reversal** and **entanglement theories**.
-- **Online Search:** Retrieves recent articles discussing **quantum breakthroughs** and expert views.
-**Model Evaluation:**
-- Model compares the news with the combined context and outputs:
-  **Score: 0.72** (Likely True).
-**Result Explanation:**
-```plaintext
-News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
-Truthfulness Score: 0.72 (Likely true)
-Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.
-```
----
-### **Why This Approach Works?**
-1. **Balanced Context**: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search).
-2. **Model Flexibility**: Zero-shot model adapts to diverse topics without retraining.
-3. **Scalable and Cost-Effective**: Uses pre-trained models, FAISS indexing, and simple APIs for implementation.
-4. **Interpretability**: Outputs include confidence scores and explanations for transparency.
 This modular approach ensures that the **truthfulness assessment** is **scalable**, **explainable**, and **adaptable** to new domains.

+---
+title: Truthfulness Checker
+emoji: 📰
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 5.4.0
+app_file: app.py
+pinned: false
+license: apache-2.0
+---
+### **Implementation Steps: Validating Information with Context**
+Validating the accuracy or degree of truthfulness of a given piece of information requires **context**—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step:
+---
+### **Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search**
+Instead of relying on a **traditional Knowledge Graph (KG)**, we use **FAISS (Facebook AI Similarity Search)**, a **faster, scalable, and flexible alternative** for semantic search.
+#### **Why FAISS is Better than a Traditional KG**
+1. **Sentence-Level Retrieval**: Unlike traditional KGs that often rely on pre-defined **entities and relationships**, FAISS uses dense **embeddings** to directly match the **semantic meaning** of entire sentences.
+2. **Scalable and High-Speed Retrieval**: FAISS efficiently handles **millions of embeddings**, making it highly scalable for real-world applications.
+3. **Flexibility**: It works with **unstructured text**, removing the need to pre-process information into entities and relations, which is often time-consuming.
+4. **Generalization**: FAISS enables **approximate nearest neighbor (ANN) search**, allowing retrieval of contextually related results, even if they are not exact matches.
+#### **Dataset Used**
+We leverage the **News Category Dataset** ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains **news headlines and short descriptions** across various categories.
+- **Why This Dataset?**
+  It covers a **wide range of topics**, making it useful for general-purpose context building.
+  - Headlines and descriptions provide **rich semantic embeddings** for similarity searches.
+  - Categories allow filtering relevant results if required (e.g., "science" or "technology").
+**Process:**
+1. We use **SentenceTransformer (all-MiniLM-L6-v2)** to generate embeddings for the query (the input news).
+2. We search against pre-computed embeddings stored in a **FAISS index** to retrieve the **top-K most relevant entries**.
+3. These results form the **initial context**, capturing related information already present in the dataset.
+---
+### **Step 2: Online Search for Real-Time Context**
+To **augment** the context retrieved from FAISS, we incorporate **real-time online search** using an API.
+#### **Why Online Search is Critical?**
+- **Fresh Information**: News and facts evolve, especially in areas like **science, technology, or politics**. Online search ensures access to the **latest updates** that may not exist in the static dataset.
+- **Diverse Sources**: It broadens the scope by pulling information from **multiple credible sources**, reducing bias and enhancing reliability.
+- **Fact-Checking**: Search engines often index **trusted fact-checking websites** that we can incorporate into the context.
+**Process:**
+1. Use an API with a **search query** derived from the input news.
+2. Retrieve relevant snippets, headlines, or summaries.
+3. Append these results to the **context** built using FAISS.
+---
+### **Step 3: Building Context from Combined Sources**
+Both FAISS-based retrieval and **online search results** are combined into a **single context string**. This provides a **comprehensive knowledge base** around the input information.
+- **Why Combine Both?**
+  - FAISS offers **pre-indexed knowledge**—ideal for **static facts** or concepts.
+  - Online search complements it with **dynamic and up-to-date insights**—perfect for verifying **recent developments**.
+This layered context improves the model’s ability to assess the **truthfulness** of the given information.
+---
+### **Step 4: Truthfulness Prediction with Zero-Shot Classification Model**
+We use the **Facebook/BART-Large-MNLI** model, a **zero-shot classification** model, for evaluation.
+#### **Why BART-Large-MNLI?**
+1. **Zero-Shot Capability**: It can handle claims and hypotheses without needing **task-specific training**—perfect for this flexible, multi-domain use case.
+2. **Contextual Matching**: It compares the input claim (news) with the constructed context to assess **semantic consistency**.
+3. **High Accuracy**: Pre-trained on **natural language inference tasks**, making it adept at understanding relationships like **entailment** and **contradiction**.
+4. **Multi-Label Support**: Can evaluate multiple labels simultaneously, ideal for **degrees of truthfulness**.
+**Process:**
+1. Input the **news** as the claim and the **context** as the hypothesis.
+2. Compute a **truthfulness score** between **0 and 1**, where:
+   - **0**: Completely **false**.
+   - **1**: Completely **true**.
+3. Generate **explanations** based on the score and suggest actions (e.g., further verification if uncertain).
+---
+### **End-to-End Example**
+**Input News:**
+"Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
+**Context Built:**
+- **FAISS Search:** Finds prior research on **quantum time reversal** and **entanglement theories**.
+- **Online Search:** Retrieves recent articles discussing **quantum breakthroughs** and expert views.
+**Model Evaluation:**
+- Model compares the news with the combined context and outputs:
+  **Score: 0.72** (Likely True).
+**Result Explanation:**
+```plaintext
+News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
+Truthfulness Score: 0.72 (Likely true)
+Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.
+```
+---
+### **Why This Approach Works?**
+1. **Balanced Context**: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search).
+2. **Model Flexibility**: Zero-shot model adapts to diverse topics without retraining.
+3. **Scalable and Cost-Effective**: Uses pre-trained models, FAISS indexing, and simple APIs for implementation.
+4. **Interpretability**: Outputs include confidence scores and explanations for transparency.
 This modular approach ensures that the **truthfulness assessment** is **scalable**, **explainable**, and **adaptable** to new domains.