Spaces:

lightmate
/

EchoTruth

Running

App Files Files Community

EchoTruth / README.md

lightmate

added config in readme

81f2695 verified about 2 months ago

preview code

raw

history blame

6.38 kB

	---
	title: Truthfulness Checker
	emoji: 📰
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 5.4.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	### Implementation Steps: Validating Information with Context

	Validating the accuracy or degree of truthfulness of a given piece of information requires context—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step:

	---

	### Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search
	Instead of relying on a traditional Knowledge Graph (KG), we use FAISS (Facebook AI Similarity Search), a faster, scalable, and flexible alternative for semantic search.

	#### Why FAISS is Better than a Traditional KG
	1. Sentence-Level Retrieval: Unlike traditional KGs that often rely on pre-defined entities and relationships, FAISS uses dense embeddings to directly match the semantic meaning of entire sentences.
	2. Scalable and High-Speed Retrieval: FAISS efficiently handles millions of embeddings, making it highly scalable for real-world applications.
	3. Flexibility: It works with unstructured text, removing the need to pre-process information into entities and relations, which is often time-consuming.
	4. Generalization: FAISS enables approximate nearest neighbor (ANN) search, allowing retrieval of contextually related results, even if they are not exact matches.

	#### Dataset Used
	We leverage the News Category Dataset ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains news headlines and short descriptions across various categories.

	- Why This Dataset?
	It covers a wide range of topics, making it useful for general-purpose context building.
	- Headlines and descriptions provide rich semantic embeddings for similarity searches.
	- Categories allow filtering relevant results if required (e.g., "science" or "technology").

	Process:
	1. We use SentenceTransformer (all-MiniLM-L6-v2) to generate embeddings for the query (the input news).
	2. We search against pre-computed embeddings stored in a FAISS index to retrieve the top-K most relevant entries.
	3. These results form the initial context, capturing related information already present in the dataset.

	---

	### Step 2: Online Search for Real-Time Context
	To augment the context retrieved from FAISS, we incorporate real-time online search using an API.

	#### Why Online Search is Critical?
	- Fresh Information: News and facts evolve, especially in areas like science, technology, or politics. Online search ensures access to the latest updates that may not exist in the static dataset.
	- Diverse Sources: It broadens the scope by pulling information from multiple credible sources, reducing bias and enhancing reliability.
	- Fact-Checking: Search engines often index trusted fact-checking websites that we can incorporate into the context.

	Process:
	1. Use an API with a search query derived from the input news.
	2. Retrieve relevant snippets, headlines, or summaries.
	3. Append these results to the context built using FAISS.

	---

	### Step 3: Building Context from Combined Sources
	Both FAISS-based retrieval and online search results are combined into a single context string. This provides a comprehensive knowledge base around the input information.

	- Why Combine Both?
	- FAISS offers pre-indexed knowledge—ideal for static facts or concepts.
	- Online search complements it with dynamic and up-to-date insights—perfect for verifying recent developments.

	This layered context improves the model’s ability to assess the truthfulness of the given information.

	---

	### Step 4: Truthfulness Prediction with Zero-Shot Classification Model
	We use the Facebook/BART-Large-MNLI model, a zero-shot classification model, for evaluation.

	#### Why BART-Large-MNLI?
	1. Zero-Shot Capability: It can handle claims and hypotheses without needing task-specific training—perfect for this flexible, multi-domain use case.
	2. Contextual Matching: It compares the input claim (news) with the constructed context to assess semantic consistency.
	3. High Accuracy: Pre-trained on natural language inference tasks, making it adept at understanding relationships like entailment and contradiction.
	4. Multi-Label Support: Can evaluate multiple labels simultaneously, ideal for degrees of truthfulness.

	Process:
	1. Input the news as the claim and the context as the hypothesis.
	2. Compute a truthfulness score between 0 and 1, where:
	- 0: Completely false.
	- 1: Completely true.
	3. Generate explanations based on the score and suggest actions (e.g., further verification if uncertain).

	---

	### End-to-End Example
	Input News:
	"Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."

	Context Built:
	- FAISS Search: Finds prior research on quantum time reversal and entanglement theories.
	- Online Search: Retrieves recent articles discussing quantum breakthroughs and expert views.

	Model Evaluation:
	- Model compares the news with the combined context and outputs:
	Score: 0.72 (Likely True).

	Result Explanation:
	```plaintext
	News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
	Truthfulness Score: 0.72 (Likely true)
	Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.
	```

	---

	### Why This Approach Works?
	1. Balanced Context: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search).
	2. Model Flexibility: Zero-shot model adapts to diverse topics without retraining.
	3. Scalable and Cost-Effective: Uses pre-trained models, FAISS indexing, and simple APIs for implementation.
	4. Interpretability: Outputs include confidence scores and explanations for transparency.

	This modular approach ensures that the truthfulness assessment is scalable, explainable, and adaptable to new domains.

	---
	title: Truthfulness Checker
	emoji: 📰
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 5.4.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	### Implementation Steps: Validating Information with Context

	Validating the accuracy or degree of truthfulness of a given piece of information requires context—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step:

	---

	### Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search
	Instead of relying on a traditional Knowledge Graph (KG), we use FAISS (Facebook AI Similarity Search), a faster, scalable, and flexible alternative for semantic search.

	#### Why FAISS is Better than a Traditional KG
	1. Sentence-Level Retrieval: Unlike traditional KGs that often rely on pre-defined entities and relationships, FAISS uses dense embeddings to directly match the semantic meaning of entire sentences.
	2. Scalable and High-Speed Retrieval: FAISS efficiently handles millions of embeddings, making it highly scalable for real-world applications.
	3. Flexibility: It works with unstructured text, removing the need to pre-process information into entities and relations, which is often time-consuming.
	4. Generalization: FAISS enables approximate nearest neighbor (ANN) search, allowing retrieval of contextually related results, even if they are not exact matches.

	#### Dataset Used
	We leverage the News Category Dataset ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains news headlines and short descriptions across various categories.

	- Why This Dataset?
	It covers a wide range of topics, making it useful for general-purpose context building.
	- Headlines and descriptions provide rich semantic embeddings for similarity searches.
	- Categories allow filtering relevant results if required (e.g., "science" or "technology").

	Process:
	1. We use SentenceTransformer (all-MiniLM-L6-v2) to generate embeddings for the query (the input news).
	2. We search against pre-computed embeddings stored in a FAISS index to retrieve the top-K most relevant entries.
	3. These results form the initial context, capturing related information already present in the dataset.

	---

	### Step 2: Online Search for Real-Time Context
	To augment the context retrieved from FAISS, we incorporate real-time online search using an API.

	#### Why Online Search is Critical?
	- Fresh Information: News and facts evolve, especially in areas like science, technology, or politics. Online search ensures access to the latest updates that may not exist in the static dataset.
	- Diverse Sources: It broadens the scope by pulling information from multiple credible sources, reducing bias and enhancing reliability.
	- Fact-Checking: Search engines often index trusted fact-checking websites that we can incorporate into the context.

	Process:
	1. Use an API with a search query derived from the input news.
	2. Retrieve relevant snippets, headlines, or summaries.
	3. Append these results to the context built using FAISS.

	---

	### Step 3: Building Context from Combined Sources
	Both FAISS-based retrieval and online search results are combined into a single context string. This provides a comprehensive knowledge base around the input information.

	- Why Combine Both?
	- FAISS offers pre-indexed knowledge—ideal for static facts or concepts.
	- Online search complements it with dynamic and up-to-date insights—perfect for verifying recent developments.

	This layered context improves the model’s ability to assess the truthfulness of the given information.

	---

	### Step 4: Truthfulness Prediction with Zero-Shot Classification Model
	We use the Facebook/BART-Large-MNLI model, a zero-shot classification model, for evaluation.

	#### Why BART-Large-MNLI?
	1. Zero-Shot Capability: It can handle claims and hypotheses without needing task-specific training—perfect for this flexible, multi-domain use case.
	2. Contextual Matching: It compares the input claim (news) with the constructed context to assess semantic consistency.
	3. High Accuracy: Pre-trained on natural language inference tasks, making it adept at understanding relationships like entailment and contradiction.
	4. Multi-Label Support: Can evaluate multiple labels simultaneously, ideal for degrees of truthfulness.

	Process:
	1. Input the news as the claim and the context as the hypothesis.
	2. Compute a truthfulness score between 0 and 1, where:
	- 0: Completely false.
	- 1: Completely true.
	3. Generate explanations based on the score and suggest actions (e.g., further verification if uncertain).

	---

	### End-to-End Example
	Input News:
	"Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."

	Context Built:
	- FAISS Search: Finds prior research on quantum time reversal and entanglement theories.
	- Online Search: Retrieves recent articles discussing quantum breakthroughs and expert views.

	Model Evaluation:
	- Model compares the news with the combined context and outputs:
	Score: 0.72 (Likely True).

	Result Explanation:
	```plaintext
	News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
	Truthfulness Score: 0.72 (Likely true)
	Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.
	```

	---

	### Why This Approach Works?
	1. Balanced Context: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search).
	2. Model Flexibility: Zero-shot model adapts to diverse topics without retraining.
	3. Scalable and Cost-Effective: Uses pre-trained models, FAISS indexing, and simple APIs for implementation.
	4. Interpretability: Outputs include confidence scores and explanations for transparency.

	This modular approach ensures that the truthfulness assessment is scalable, explainable, and adaptable to new domains.