### **Project Description** 
This project is designed to evaluate the **truthfulness** of a given piece of text (e.g., news or facts) by assigning a **truthfulness score** between **0 (False)** and **1 (True)**. 

The system employs a **knowledge graph (KG)** and **online search** to retrieve relevant context. Using this context, a **zero-shot classification model** determines the likelihood of the input being true or false. 

The application is implemented as a **Gradio Web Interface** for ease of use, allowing users to input **text** or a **URL** for evaluation. 

---

## **Problem Statement** 
In today's digital world, misinformation spreads rapidly, making it challenging to determine the credibility of news articles and statements. 

The goal of this project is to create a **truth-verification tool** that: 
1. Accepts **raw text** or extracts content from a **URL**. 
2. Retrieves **related context** from a **knowledge graph (KG)** and **internet search**. 
3. Evaluates truthfulness based on the provided information. 
4. Outputs a **truthfulness score** and actionable insights to help users make informed decisions. 

---

## **Thought Process** 
### **Step 1: Context Retrieval** 
- **Why Needed?** Misinformation can only be validated against verified facts and existing knowledge. 
- **Approach:** Retrieve contextual information from: 
 1. **Knowledge Graph (KG):** Finds semantically similar articles or facts using **FAISS** and **Sentence Transformers**. 
 2. **Online Search:** Queries real-time data through **OpenAI's API** to fetch relevant search results. 

### **Step 2: Truth Evaluation** 
- **Why Needed?** The final evaluation depends on aligning the input with retrieved context to determine its validity. 
- **Approach:** 
 - Use a **zero-shot classification model** (`facebook/bart-large-mnli`) to compare input and context. 
 - Assign a **probability score** indicating truthfulness. 

---

In [1]:
!python --version

Python 3.10.16


In [2]:
import importlib

In [3]:
from dotenv import load_dotenv
import os
 

# Load environment variables from .env file
load_dotenv()

# Fetch values from the .env file
SEARCH_API_KEY = os.getenv("SEARCH_API_KEY")
SEARCH_BASE_URL = os.getenv("SEARCH_BASE_URL")
SEARCH_MODEL = os.getenv("SEARCH_MODEL")
KG_INDEX_PATH="KG/news_category_index.faiss"
KG_DATASET_PATH="KG/News_Category_Dataset_v3.json"

In [4]:
import importlib
import modules.online_search, modules.knowledge_graph, modules.validation 

# Reload the module
importlib.reload(modules.online_search)
importlib.reload(modules.knowledge_graph)
importlib.reload(modules.validation)

# Re-import the specific function to ensure the latest version
from modules.online_search import search_online
from modules.validation import calculate_truthfulness_score
from modules.knowledge_graph import search_kg


 from .autonotebook import tqdm as notebook_tqdm


In [5]:
# Input: Information to validate (news or claim)
news = "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment" 

# Context: Supporting information retrieved from knowledge graphs and online searches
context = ""

### **Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search** 
Instead of relying on a **traditional Knowledge Graph (KG)**, we use **FAISS (Facebook AI Similarity Search)**, a **faster, scalable, and flexible alternative** for semantic search. 

#### **Why FAISS is Better than a Traditional KG** 
1. **Sentence-Level Retrieval**: Unlike traditional KGs that often rely on pre-defined **entities and relationships**, FAISS uses dense **embeddings** to directly match the **semantic meaning** of entire sentences. 
2. **Scalable and High-Speed Retrieval**: FAISS efficiently handles **millions of embeddings**, making it highly scalable for real-world applications. 
3. **Flexibility**: It works with **unstructured text**, removing the need to pre-process information into entities and relations, which is often time-consuming. 
4. **Generalization**: FAISS enables **approximate nearest neighbor (ANN) search**, allowing retrieval of contextually related results, even if they are not exact matches.

#### **Dataset Used** 
We leverage the **News Category Dataset** ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains **news headlines and short descriptions** across various categories. 

- **Why This Dataset?** 
 It covers a **wide range of topics**, making it useful for general-purpose context building. 
 - Headlines and descriptions provide **rich semantic embeddings** for similarity searches. 
 - Categories allow filtering relevant results if required (e.g., "science" or "technology").

**Process:**
1. We use **SentenceTransformer (all-MiniLM-L6-v2)** to generate embeddings for the query (the input news). 
2. We search against pre-computed embeddings stored in a **FAISS index** to retrieve the **top-K most relevant entries**. 
3. These results form the **initial context**, capturing related information already present in the dataset.

In [6]:
kg_content = search_kg(query=news, index_path=KG_INDEX_PATH, dataset_path=KG_DATASET_PATH)

In [7]:
print(kg_content)

Simple Thought Experiment Helps Answer 'Arrow Of Time' Question. Rethinking Time. Everyone in life must face and move through time. Feeling how precious life is, we tend to squeeze as much as we can out of the moment. Is There a Way Out of Negative Cycles of Thought?. I have a choice between the thoughts I keep and the ones I need to trash, depending on whether they come from a place of light or of darkness. To me, these thoughts of light are those that stem from a divine source, which thoughts of darkness don't have.


### **Step 2: Online Search for Real-Time Context** 
To **augment** the context retrieved from FAISS, we incorporate **real-time online search** using an API. 

#### **Why Online Search is Critical?** 
- **Fresh Information**: News and facts evolve, especially in areas like **science, technology, or politics**. Online search ensures access to the **latest updates** that may not exist in the static dataset. 
- **Diverse Sources**: It broadens the scope by pulling information from **multiple credible sources**, reducing bias and enhancing reliability. 
- **Fact-Checking**: Search engines often index **trusted fact-checking websites** that we can incorporate into the context.

**Process:**
1. Use an API with a **search query** derived from the input news. 
2. Retrieve relevant snippets, headlines, or summaries. 
3. Append these results to the **context** built using FAISS.

In [8]:
online_search_results = search_online(query=news, api_key=SEARCH_API_KEY, base_url=SEARCH_BASE_URL, model=SEARCH_MODEL)

In [9]:
print(type(online_search_results)) 
online_search_results['message_content']




'The recent experiment conducted by scientists at the University of Toronto has made a significant breakthrough in the field of quantum mechanics, demonstrating a phenomenon known as "negative time." Here are the key points from this groundbreaking study:\n\n## Experimental Setup and Observations\nThe researchers used a sophisticated setup involving photon pulses passed through a cloud of ultracold atoms, typically at near absolute zero temperatures. When these photons interacted with the atoms, they caused atomic excitation, where the electrons in the atoms jumped to a higher energy state. The interesting aspect was observed when the photons were not absorbed by the atoms; despite this, the atoms still showed excitation for the exact amount of time as if the photons had been absorbed[2][4][5].\n\n## Negative Time Phenomenon\nThe experiment showed that photons could appear to exit the medium before they entered it. This is described as a "negative time delay" or "negative group delay."

### **Step 3: Building Context from Combined Sources** 
Both FAISS-based retrieval and **online search results** are combined into a **single context string**. This provides a **comprehensive knowledge base** around the input information. 

- **Why Combine Both?** 
 - FAISS offers **pre-indexed knowledge**—ideal for **static facts** or concepts. 
 - Online search complements it with **dynamic and up-to-date insights**—perfect for verifying **recent developments**. 

This layered context improves the model’s ability to assess the **truthfulness** of the given information.


In [11]:
context = kg_content + '\n' + online_search_results['message_content']

In [12]:
print(context)

Simple Thought Experiment Helps Answer 'Arrow Of Time' Question. Rethinking Time. Everyone in life must face and move through time. Feeling how precious life is, we tend to squeeze as much as we can out of the moment. Is There a Way Out of Negative Cycles of Thought?. I have a choice between the thoughts I keep and the ones I need to trash, depending on whether they come from a place of light or of darkness. To me, these thoughts of light are those that stem from a divine source, which thoughts of darkness don't have.
The recent experiment conducted by scientists at the University of Toronto has made a significant breakthrough in the field of quantum mechanics, demonstrating a phenomenon known as "negative time." Here are the key points from this groundbreaking study:

## Experimental Setup and Observations
The researchers used a sophisticated setup involving photon pulses passed through a cloud of ultracold atoms, typically at near absolute zero temperatures. When these photons intera

### **Step 4: Truthfulness Prediction with Zero-Shot Classification Model** 
We use the **Facebook/BART-Large-MNLI** model, a **zero-shot classification** model, for evaluation. 

#### **Why BART-Large-MNLI?** 
1. **Zero-Shot Capability**: It can handle claims and hypotheses without needing **task-specific training**—perfect for this flexible, multi-domain use case. 
2. **Contextual Matching**: It compares the input claim (news) with the constructed context to assess **semantic consistency**. 
3. **High Accuracy**: Pre-trained on **natural language inference tasks**, making it adept at understanding relationships like **entailment** and **contradiction**. 
4. **Multi-Label Support**: Can evaluate multiple labels simultaneously, ideal for **degrees of truthfulness**.

**Process:**
1. Input the **news** as the claim and the **context** as the hypothesis. 
2. Compute a **truthfulness score** between **0 and 1**, where: 
 - **0**: Completely **false**. 
 - **1**: Completely **true**. 
3. Generate **explanations** based on the score and suggest actions (e.g., further verification if uncertain).


In [13]:
truth_score = calculate_truthfulness_score(info=news, context=context)

Device set to use cpu


In [14]:

# Generate explanation based on the score
if truth_score > 0.7:
 status = "likely true"
 recommendation = "You can reasonably trust this information, but further verification is always recommended for critical decisions."
elif truth_score > 0.4:
 status = "uncertain"
 recommendation = "This information might be partially true, but additional investigation is required before accepting it as fact."
else:
 status = "unlikely to be true"
 recommendation = "It is recommended to verify this information through multiple reliable sources before trusting it."

# Print result with explanation
print(f"News: \"{news}\"")
print(f"Truthfulness Score: {truth_score:.2f} ({status.capitalize()})")
print(f"Analysis: {recommendation}")


News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment"
Truthfulness Score: 0.89 (Likely true)
Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.


### **Check for a False News** 

In [27]:
news = "A recent article has reported on a shocking incident that occurred during a New Year's Eve celebration in New Orleans. According to the news, a suspect named Shamsud Din Jabbar was identified following a tragic attack where a vehicle rammed into a crowd on Bourbon Street, resulting in 10 fatalities and numerous injuries. Eyewitnesses described the scene as chaotic, with the driver reportedly exiting the vehicle and opening fire on the crowd before fleeing the scene. Authorities have since confirmed that an improvised explosive device was found at the location, leading to investigations into the incident being classified as a terrorist attack. In a bizarre twist, social media platforms are now buzzing with claims that Jabbar was actually an undercover agent working to infiltrate extremist groups. This unverified information has sparked widespread speculation and conspiracy theories online, with some claiming that the attack was staged as part of a larger government operation. However, no credible evidence has surfaced to support these claims, and officials have urged the public to refrain from spreading misinformation while investigations are ongoing."

In [28]:
context = ""

In [21]:
# Retrive all relavet info related to this news
kg_content = search_kg(query=news, index_path=KG_INDEX_PATH, dataset_path=KG_DATASET_PATH)

In [22]:
# Search online for more context
online_search_results = search_online(query=news, api_key=SEARCH_API_KEY, base_url=SEARCH_BASE_URL, model=SEARCH_MODEL)

In [23]:
# Both FAISS-based retrieval and online search results are combined into a single context string. This provides a **comprehensive knowledge base** around the input information. 
context = kg_content + '\n' + online_search_results['message_content']

In [24]:
print(context)

Friday's Morning Email: 'Terrorist Incident' Rocks London Tube. Multiple people were injured. New Details Emerge On Suspect In London Terror Attack. ISIS claimed responsibility for the attack but it's unclear to what extent the group was involved.
The recent incident in New Orleans involved a tragic and violent attack that occurred during New Year's Eve celebrations on Bourbon Street. Here are the key details:

## Suspect Identification
The suspect has been identified as Shamsud Din Jabbar, a 42-year-old U.S. citizen from Texas. Jabbar was an honorably discharged U.S. Army veteran who had converted to Islam at some point in his life[4].

## Incident Details
On January 1, 2025, around 3:15 a.m., Jabbar drove a vehicle into a crowd of people on Bourbon Street in the French Quarter of New Orleans. This act was described by authorities as "very intentional behavior" aimed at causing maximum harm. After crashing the vehicle, Jabbar exited and opened fire on the crowd and police officers, re

In [25]:
# Estimate truth score
truth_score = calculate_truthfulness_score(info=news, context=context)

Device set to use cpu


In [29]:

# Generate explanation based on the score
if truth_score > 0.7:
 status = "likely true"
 recommendation = "You can reasonably trust this information, but further verification is always recommended for critical decisions."
elif truth_score > 0.4:
 status = "uncertain"
 recommendation = "This information might be partially true, but additional investigation is required before accepting it as fact."
else:
 status = "unlikely to be true"
 recommendation = "It is recommended to verify this information through multiple reliable sources before trusting it."

# Print result with explanation
print(f"News: \"{news}\"")
print(f"Truthfulness Score: {truth_score:.2f} ({status.capitalize()})")
print(f"Analysis: {recommendation}")


News: "A recent article has reported on a shocking incident that occurred during a New Year's Eve celebration in New Orleans. According to the news, a suspect named Shamsud Din Jabbar was identified following a tragic attack where a vehicle rammed into a crowd on Bourbon Street, resulting in 10 fatalities and numerous injuries. Eyewitnesses described the scene as chaotic, with the driver reportedly exiting the vehicle and opening fire on the crowd before fleeing the scene. Authorities have since confirmed that an improvised explosive device was found at the location, leading to investigations into the incident being classified as a terrorist attack. In a bizarre twist, social media platforms are now buzzing with claims that Jabbar was actually an undercover agent working to infiltrate extremist groups. This unverified information has sparked widespread speculation and conspiracy theories online, with some claiming that the attack was staged as part of a larger government operation. How