{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### **Project Description** \n", "This project is designed to evaluate the **truthfulness** of a given piece of text (e.g., news or facts) by assigning a **truthfulness score** between **0 (False)** and **1 (True)**. \n", "\n", "The system employs a **knowledge graph (KG)** and **online search** to retrieve relevant context. Using this context, a **zero-shot classification model** determines the likelihood of the input being true or false. \n", "\n", "The application is implemented as a **Gradio Web Interface** for ease of use, allowing users to input **text** or a **URL** for evaluation. \n", "\n", "---\n", "\n", "## **Problem Statement** \n", "In today's digital world, misinformation spreads rapidly, making it challenging to determine the credibility of news articles and statements. \n", "\n", "The goal of this project is to create a **truth-verification tool** that: \n", "1. Accepts **raw text** or extracts content from a **URL**. \n", "2. Retrieves **related context** from a **knowledge graph (KG)** and **internet search**. \n", "3. Evaluates truthfulness based on the provided information. \n", "4. Outputs a **truthfulness score** and actionable insights to help users make informed decisions. \n", "\n", "---\n", "\n", "## **Thought Process** \n", "### **Step 1: Context Retrieval** \n", "- **Why Needed?** Misinformation can only be validated against verified facts and existing knowledge. \n", "- **Approach:** Retrieve contextual information from: \n", " 1. **Knowledge Graph (KG):** Finds semantically similar articles or facts using **FAISS** and **Sentence Transformers**. \n", " 2. **Online Search:** Queries real-time data through **OpenAI's API** to fetch relevant search results. \n", "\n", "### **Step 2: Truth Evaluation** \n", "- **Why Needed?** The final evaluation depends on aligning the input with retrieved context to determine its validity. \n", "- **Approach:** \n", " - Use a **zero-shot classification model** (`facebook/bart-large-mnli`) to compare input and context. \n", " - Assign a **probability score** indicating truthfulness. \n", "\n", "---" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 3.10.16\n" ] } ], "source": [ "!python --version" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import importlib" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from dotenv import load_dotenv\n", "import os\n", " \n", "\n", "# Load environment variables from .env file\n", "load_dotenv()\n", "\n", "# Fetch values from the .env file\n", "SEARCH_API_KEY = os.getenv(\"SEARCH_API_KEY\")\n", "SEARCH_BASE_URL = os.getenv(\"SEARCH_BASE_URL\")\n", "SEARCH_MODEL = os.getenv(\"SEARCH_MODEL\")\n", "KG_INDEX_PATH=\"KG/news_category_index.faiss\"\n", "KG_DATASET_PATH=\"KG/News_Category_Dataset_v3.json\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "d:\\dev\\Hack\\EchoTruth\\env\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "import importlib\n", "import modules.online_search, modules.knowledge_graph, modules.validation \n", "\n", "# Reload the module\n", "importlib.reload(modules.online_search)\n", "importlib.reload(modules.knowledge_graph)\n", "importlib.reload(modules.validation)\n", "\n", "# Re-import the specific function to ensure the latest version\n", "from modules.online_search import search_online\n", "from modules.validation import calculate_truthfulness_score\n", "from modules.knowledge_graph import search_kg\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Input: Information to validate (news or claim)\n", "news = \"Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment\" \n", "\n", "# Context: Supporting information retrieved from knowledge graphs and online searches\n", "context = \"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search** \n", "Instead of relying on a **traditional Knowledge Graph (KG)**, we use **FAISS (Facebook AI Similarity Search)**, a **faster, scalable, and flexible alternative** for semantic search. \n", "\n", "#### **Why FAISS is Better than a Traditional KG** \n", "1. **Sentence-Level Retrieval**: Unlike traditional KGs that often rely on pre-defined **entities and relationships**, FAISS uses dense **embeddings** to directly match the **semantic meaning** of entire sentences. \n", "2. **Scalable and High-Speed Retrieval**: FAISS efficiently handles **millions of embeddings**, making it highly scalable for real-world applications. \n", "3. **Flexibility**: It works with **unstructured text**, removing the need to pre-process information into entities and relations, which is often time-consuming. \n", "4. **Generalization**: FAISS enables **approximate nearest neighbor (ANN) search**, allowing retrieval of contextually related results, even if they are not exact matches.\n", "\n", "#### **Dataset Used** \n", "We leverage the **News Category Dataset** ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains **news headlines and short descriptions** across various categories. \n", "\n", "- **Why This Dataset?** \n", " It covers a **wide range of topics**, making it useful for general-purpose context building. \n", " - Headlines and descriptions provide **rich semantic embeddings** for similarity searches. \n", " - Categories allow filtering relevant results if required (e.g., \"science\" or \"technology\").\n", "\n", "**Process:**\n", "1. We use **SentenceTransformer (all-MiniLM-L6-v2)** to generate embeddings for the query (the input news). \n", "2. We search against pre-computed embeddings stored in a **FAISS index** to retrieve the **top-K most relevant entries**. \n", "3. These results form the **initial context**, capturing related information already present in the dataset." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "kg_content = search_kg(query=news, index_path=KG_INDEX_PATH, dataset_path=KG_DATASET_PATH)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Simple Thought Experiment Helps Answer 'Arrow Of Time' Question. Rethinking Time. Everyone in life must face and move through time. Feeling how precious life is, we tend to squeeze as much as we can out of the moment. Is There a Way Out of Negative Cycles of Thought?. I have a choice between the thoughts I keep and the ones I need to trash, depending on whether they come from a place of light or of darkness. To me, these thoughts of light are those that stem from a divine source, which thoughts of darkness don't have.\n" ] } ], "source": [ "print(kg_content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Step 2: Online Search for Real-Time Context** \n", "To **augment** the context retrieved from FAISS, we incorporate **real-time online search** using an API. \n", "\n", "#### **Why Online Search is Critical?** \n", "- **Fresh Information**: News and facts evolve, especially in areas like **science, technology, or politics**. Online search ensures access to the **latest updates** that may not exist in the static dataset. \n", "- **Diverse Sources**: It broadens the scope by pulling information from **multiple credible sources**, reducing bias and enhancing reliability. \n", "- **Fact-Checking**: Search engines often index **trusted fact-checking websites** that we can incorporate into the context.\n", "\n", "**Process:**\n", "1. Use an API with a **search query** derived from the input news. \n", "2. Retrieve relevant snippets, headlines, or summaries. \n", "3. Append these results to the **context** built using FAISS." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "online_search_results = search_online(query=news, api_key=SEARCH_API_KEY, base_url=SEARCH_BASE_URL, model=SEARCH_MODEL)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "'The recent experiment conducted by scientists at the University of Toronto has made a significant breakthrough in the field of quantum mechanics, demonstrating a phenomenon known as \"negative time.\" Here are the key points from this groundbreaking study:\\n\\n## Experimental Setup and Observations\\nThe researchers used a sophisticated setup involving photon pulses passed through a cloud of ultracold atoms, typically at near absolute zero temperatures. When these photons interacted with the atoms, they caused atomic excitation, where the electrons in the atoms jumped to a higher energy state. The interesting aspect was observed when the photons were not absorbed by the atoms; despite this, the atoms still showed excitation for the exact amount of time as if the photons had been absorbed[2][4][5].\\n\\n## Negative Time Phenomenon\\nThe experiment showed that photons could appear to exit the medium before they entered it. This is described as a \"negative time delay\" or \"negative group delay.\" Essentially, when photons were absorbed and then re-emitted, they did so without any delay, or even before the atoms could de-excite. This behavior suggests that the photons were interacting with the atoms in such a way that the excitation time of the atoms corresponded to a negative value[2][4][5].\\n\\n## Quantum Mechanics Context\\nIn the quantum realm, this phenomenon is not paradoxical but rather a consequence of quantum mechanics. The concept of \"weak values\" in quantum theory allows for measurements that can take on values outside the normal expected range. Here, the weak value of the atomic excitation time was found to be negative, corresponding to the negative group delay observed. This means that, from the perspective of the measurement, the atoms were excited before the light even arrived[4].\\n\\n## Implications and Significance\\nThe study challenges traditional views of light-matter interactions and suggests that negative time has more physical significance than previously appreciated, particularly in the field of optics. While this does not change our fundamental understanding of time, it has implications for quantum technology, such as improving quantum memory and communication systems by enhancing the control of photon-atom interactions[2][4][5].\\n\\n## Reaction and Future Research\\nThe findings, although yet to be peer-reviewed, have generated significant interest and discussion within the scientific community. Physicist Aephraim Steinberg, who led the study, expressed excitement about the potential for deeper inquiry into the mysteries of quantum physics. The study is expected to spur further investigation into the nature of time and quantum mechanics[1][5].'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(type(online_search_results)) \n", "online_search_results['message_content']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Step 3: Building Context from Combined Sources** \n", "Both FAISS-based retrieval and **online search results** are combined into a **single context string**. This provides a **comprehensive knowledge base** around the input information. \n", "\n", "- **Why Combine Both?** \n", " - FAISS offers **pre-indexed knowledge**—ideal for **static facts** or concepts. \n", " - Online search complements it with **dynamic and up-to-date insights**—perfect for verifying **recent developments**. \n", "\n", "This layered context improves the model’s ability to assess the **truthfulness** of the given information.\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "context = kg_content + '\\n' + online_search_results['message_content']" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Simple Thought Experiment Helps Answer 'Arrow Of Time' Question. Rethinking Time. Everyone in life must face and move through time. Feeling how precious life is, we tend to squeeze as much as we can out of the moment. Is There a Way Out of Negative Cycles of Thought?. I have a choice between the thoughts I keep and the ones I need to trash, depending on whether they come from a place of light or of darkness. To me, these thoughts of light are those that stem from a divine source, which thoughts of darkness don't have.\n", "The recent experiment conducted by scientists at the University of Toronto has made a significant breakthrough in the field of quantum mechanics, demonstrating a phenomenon known as \"negative time.\" Here are the key points from this groundbreaking study:\n", "\n", "## Experimental Setup and Observations\n", "The researchers used a sophisticated setup involving photon pulses passed through a cloud of ultracold atoms, typically at near absolute zero temperatures. When these photons interacted with the atoms, they caused atomic excitation, where the electrons in the atoms jumped to a higher energy state. The interesting aspect was observed when the photons were not absorbed by the atoms; despite this, the atoms still showed excitation for the exact amount of time as if the photons had been absorbed[2][4][5].\n", "\n", "## Negative Time Phenomenon\n", "The experiment showed that photons could appear to exit the medium before they entered it. This is described as a \"negative time delay\" or \"negative group delay.\" Essentially, when photons were absorbed and then re-emitted, they did so without any delay, or even before the atoms could de-excite. This behavior suggests that the photons were interacting with the atoms in such a way that the excitation time of the atoms corresponded to a negative value[2][4][5].\n", "\n", "## Quantum Mechanics Context\n", "In the quantum realm, this phenomenon is not paradoxical but rather a consequence of quantum mechanics. The concept of \"weak values\" in quantum theory allows for measurements that can take on values outside the normal expected range. Here, the weak value of the atomic excitation time was found to be negative, corresponding to the negative group delay observed. This means that, from the perspective of the measurement, the atoms were excited before the light even arrived[4].\n", "\n", "## Implications and Significance\n", "The study challenges traditional views of light-matter interactions and suggests that negative time has more physical significance than previously appreciated, particularly in the field of optics. While this does not change our fundamental understanding of time, it has implications for quantum technology, such as improving quantum memory and communication systems by enhancing the control of photon-atom interactions[2][4][5].\n", "\n", "## Reaction and Future Research\n", "The findings, although yet to be peer-reviewed, have generated significant interest and discussion within the scientific community. Physicist Aephraim Steinberg, who led the study, expressed excitement about the potential for deeper inquiry into the mysteries of quantum physics. The study is expected to spur further investigation into the nature of time and quantum mechanics[1][5].\n" ] } ], "source": [ "print(context)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Step 4: Truthfulness Prediction with Zero-Shot Classification Model** \n", "We use the **Facebook/BART-Large-MNLI** model, a **zero-shot classification** model, for evaluation. \n", "\n", "#### **Why BART-Large-MNLI?** \n", "1. **Zero-Shot Capability**: It can handle claims and hypotheses without needing **task-specific training**—perfect for this flexible, multi-domain use case. \n", "2. **Contextual Matching**: It compares the input claim (news) with the constructed context to assess **semantic consistency**. \n", "3. **High Accuracy**: Pre-trained on **natural language inference tasks**, making it adept at understanding relationships like **entailment** and **contradiction**. \n", "4. **Multi-Label Support**: Can evaluate multiple labels simultaneously, ideal for **degrees of truthfulness**.\n", "\n", "**Process:**\n", "1. Input the **news** as the claim and the **context** as the hypothesis. \n", "2. Compute a **truthfulness score** between **0 and 1**, where: \n", " - **0**: Completely **false**. \n", " - **1**: Completely **true**. \n", "3. Generate **explanations** based on the score and suggest actions (e.g., further verification if uncertain).\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Device set to use cpu\n" ] } ], "source": [ "truth_score = calculate_truthfulness_score(info=news, context=context)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "News: \"Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment\"\n", "Truthfulness Score: 0.89 (Likely true)\n", "Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.\n" ] } ], "source": [ "\n", "# Generate explanation based on the score\n", "if truth_score > 0.7:\n", " status = \"likely true\"\n", " recommendation = \"You can reasonably trust this information, but further verification is always recommended for critical decisions.\"\n", "elif truth_score > 0.4:\n", " status = \"uncertain\"\n", " recommendation = \"This information might be partially true, but additional investigation is required before accepting it as fact.\"\n", "else:\n", " status = \"unlikely to be true\"\n", " recommendation = \"It is recommended to verify this information through multiple reliable sources before trusting it.\"\n", "\n", "# Print result with explanation\n", "print(f\"News: \\\"{news}\\\"\")\n", "print(f\"Truthfulness Score: {truth_score:.2f} ({status.capitalize()})\")\n", "print(f\"Analysis: {recommendation}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Check for a False News** " ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "news = \"A recent article has reported on a shocking incident that occurred during a New Year's Eve celebration in New Orleans. According to the news, a suspect named Shamsud Din Jabbar was identified following a tragic attack where a vehicle rammed into a crowd on Bourbon Street, resulting in 10 fatalities and numerous injuries. Eyewitnesses described the scene as chaotic, with the driver reportedly exiting the vehicle and opening fire on the crowd before fleeing the scene. Authorities have since confirmed that an improvised explosive device was found at the location, leading to investigations into the incident being classified as a terrorist attack. In a bizarre twist, social media platforms are now buzzing with claims that Jabbar was actually an undercover agent working to infiltrate extremist groups. This unverified information has sparked widespread speculation and conspiracy theories online, with some claiming that the attack was staged as part of a larger government operation. However, no credible evidence has surfaced to support these claims, and officials have urged the public to refrain from spreading misinformation while investigations are ongoing.\"" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "context = \"\"" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Retrive all relavet info related to this news\n", "kg_content = search_kg(query=news, index_path=KG_INDEX_PATH, dataset_path=KG_DATASET_PATH)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Search online for more context\n", "online_search_results = search_online(query=news, api_key=SEARCH_API_KEY, base_url=SEARCH_BASE_URL, model=SEARCH_MODEL)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Both FAISS-based retrieval and online search results are combined into a single context string. This provides a **comprehensive knowledge base** around the input information. \n", "context = kg_content + '\\n' + online_search_results['message_content']" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Friday's Morning Email: 'Terrorist Incident' Rocks London Tube. Multiple people were injured. New Details Emerge On Suspect In London Terror Attack. ISIS claimed responsibility for the attack but it's unclear to what extent the group was involved.\n", "The recent incident in New Orleans involved a tragic and violent attack that occurred during New Year's Eve celebrations on Bourbon Street. Here are the key details:\n", "\n", "## Suspect Identification\n", "The suspect has been identified as Shamsud Din Jabbar, a 42-year-old U.S. citizen from Texas. Jabbar was an honorably discharged U.S. Army veteran who had converted to Islam at some point in his life[4].\n", "\n", "## Incident Details\n", "On January 1, 2025, around 3:15 a.m., Jabbar drove a vehicle into a crowd of people on Bourbon Street in the French Quarter of New Orleans. This act was described by authorities as \"very intentional behavior\" aimed at causing maximum harm. After crashing the vehicle, Jabbar exited and opened fire on the crowd and police officers, resulting in a shootout with law enforcement. Jabbar was subsequently killed in the exchange with police[2][3][5].\n", "\n", "## Casualties\n", "The attack resulted in the deaths of 15 people and injured 35 others. Most of the victims were local residents rather than tourists. Two police officers were also injured during the shootout but are in stable condition[1][2][3].\n", "\n", "## Investigation\n", "The FBI is leading the investigation, which is being classified as an act of terrorism. Authorities found at least one suspected improvised explosive device at the scene, further supporting the terrorism classification[2][3][5].\n", "\n", "## Background of the Suspect\n", "Jabbar had a military background, serving in the U.S. Army as a human resources and information technology specialist. He had been living in Texas and had minor criminal infractions in the past. In recent months, he had been acting erratically according to his family members[4].\n", "\n", "## Conspiracy Theories\n", "Despite the official investigation and findings, there are unverified claims circulating on social media suggesting that Jabbar might have been an undercover agent involved in a larger government operation. However, there is no credible evidence to support these conspiracy theories, and officials have urged the public to avoid spreading misinformation while the investigation is ongoing. President Biden has stated that the suspect appears to have been inspired by ISIS, which aligns with the terrorism investigation[2][3][5].\n" ] } ], "source": [ "print(context)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Device set to use cpu\n" ] } ], "source": [ "# Estimate truth score\n", "truth_score = calculate_truthfulness_score(info=news, context=context)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "News: \"A recent article has reported on a shocking incident that occurred during a New Year's Eve celebration in New Orleans. According to the news, a suspect named Shamsud Din Jabbar was identified following a tragic attack where a vehicle rammed into a crowd on Bourbon Street, resulting in 10 fatalities and numerous injuries. Eyewitnesses described the scene as chaotic, with the driver reportedly exiting the vehicle and opening fire on the crowd before fleeing the scene. Authorities have since confirmed that an improvised explosive device was found at the location, leading to investigations into the incident being classified as a terrorist attack. In a bizarre twist, social media platforms are now buzzing with claims that Jabbar was actually an undercover agent working to infiltrate extremist groups. This unverified information has sparked widespread speculation and conspiracy theories online, with some claiming that the attack was staged as part of a larger government operation. However, no credible evidence has surfaced to support these claims, and officials have urged the public to refrain from spreading misinformation while investigations are ongoing.\"\n", "Truthfulness Score: 0.38 (Unlikely to be true)\n", "Analysis: It is recommended to verify this information through multiple reliable sources before trusting it.\n" ] } ], "source": [ "\n", "# Generate explanation based on the score\n", "if truth_score > 0.7:\n", " status = \"likely true\"\n", " recommendation = \"You can reasonably trust this information, but further verification is always recommended for critical decisions.\"\n", "elif truth_score > 0.4:\n", " status = \"uncertain\"\n", " recommendation = \"This information might be partially true, but additional investigation is required before accepting it as fact.\"\n", "else:\n", " status = \"unlikely to be true\"\n", " recommendation = \"It is recommended to verify this information through multiple reliable sources before trusting it.\"\n", "\n", "# Print result with explanation\n", "print(f\"News: \\\"{news}\\\"\")\n", "print(f\"Truthfulness Score: {truth_score:.2f} ({status.capitalize()})\")\n", "print(f\"Analysis: {recommendation}\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 2 }