--- title: Truthfulness Checker emoji: 📰 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.4.0 app_file: app.py pinned: false license: apache-2.0 --- EchoTruth is an AI-powered application that helps you verify the authenticity of news articles in real-time. By using a combination of knowledge graphs, online search, and zero-shot classification models, it provides a truthfulness score, explanation, and sources for the given news input. The application is powered by advanced AI models, ensuring an accurate and up-to-date analysis. ## How to Clone and Run the Application ### 1. Create a Conda Environment First, create a new Conda environment with Python 3.10: ```bash conda create --prefix ./env python==3.10 ``` ### 2. Install Required Libraries Next, install the required libraries from the `requirements.txt` file: ```bash pip install -r requirements.txt ``` ### 3. Use a `.env` File for Secrets Create a `.env` file in the project root directory and include the following keys with your respective API keys: ```env SEARCH_API_KEY="" SEARCH_BASE_URL="" SEARCH_MODEL="" GEMINI_API_KEY="" ``` Make sure to replace the placeholders with your actual API keys. ### 4. Run the Application Finally, you can run the application by executing: ```bash python app.py ``` This will launch the application, and you can start verifying news authenticity by entering the news text or URL in the provided input box. --- ## How It Works ### Overview EchoTruth uses AI models to check the authenticity of news in real-time by processing the given news or article and validating it using a combination of external search and knowledge graph retrieval. ### **Implementation Steps: Validating Information with Context** Validating the accuracy or degree of truthfulness of a given piece of information requires **context**—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step: --- ### **Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search** Instead of relying on a **traditional Knowledge Graph (KG)**, we use **FAISS (Facebook AI Similarity Search)**, a **faster, scalable, and flexible alternative** for semantic search. #### **Why FAISS is Better than a Traditional KG** 1. **Sentence-Level Retrieval**: Unlike traditional KGs that often rely on pre-defined **entities and relationships**, FAISS uses dense **embeddings** to directly match the **semantic meaning** of entire sentences. 2. **Scalable and High-Speed Retrieval**: FAISS efficiently handles **millions of embeddings**, making it highly scalable for real-world applications. 3. **Flexibility**: It works with **unstructured text**, removing the need to pre-process information into entities and relations, which is often time-consuming. 4. **Generalization**: FAISS enables **approximate nearest neighbor (ANN) search**, allowing retrieval of contextually related results, even if they are not exact matches. #### **Dataset Used** We leverage the **News Category Dataset** ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains **news headlines and short descriptions** across various categories. - **Why This Dataset?** It covers a **wide range of topics**, making it useful for general-purpose context building. - Headlines and descriptions provide **rich semantic embeddings** for similarity searches. - Categories allow filtering relevant results if required (e.g., "science" or "technology"). **Process:** 1. We use **SentenceTransformer (all-MiniLM-L6-v2)** to generate embeddings for the query (the input news). 2. We search against pre-computed embeddings stored in a **FAISS index** to retrieve the **top-K most relevant entries**. 3. These results form the **initial context**, capturing related information already present in the dataset. --- ### **Step 2: Online Search for Real-Time Context** To **augment** the context retrieved from FAISS, we incorporate **real-time online search** using an API. #### **Why Online Search is Critical?** - **Fresh Information**: News and facts evolve, especially in areas like **science, technology, or politics**. Online search ensures access to the **latest updates** that may not exist in the static dataset. - **Diverse Sources**: It broadens the scope by pulling information from **multiple credible sources**, reducing bias and enhancing reliability. - **Fact-Checking**: Search engines often index **trusted fact-checking websites** that we can incorporate into the context. **Process:** 1. Use an API with a **search query** derived from the input news. 2. Retrieve relevant snippets, headlines, or summaries. 3. Append these results to the **context** built using FAISS. --- ### **Step 3: Building Context from Combined Sources** Both FAISS-based retrieval and **online search results** are combined into a **single context string**. This provides a **comprehensive knowledge base** around the input information. - **Why Combine Both?** - FAISS offers **pre-indexed knowledge**—ideal for **static facts** or concepts. - Online search complements it with **dynamic and up-to-date insights**—perfect for verifying **recent developments**. This layered context improves the model’s ability to assess the **truthfulness** of the given information. --- ### **Step 4: Truthfulness Prediction with Zero-Shot Classification Model** We use the **Facebook/BART-Large-MNLI** model, a **zero-shot classification** model, for evaluation. #### **Why BART-Large-MNLI?** 1. **Zero-Shot Capability**: It can handle claims and hypotheses without needing **task-specific training**—perfect for this flexible, multi-domain use case. 2. **Contextual Matching**: It compares the input claim (news) with the constructed context to assess **semantic consistency**. 3. **High Accuracy**: Pre-trained on **natural language inference tasks**, making it adept at understanding relationships like **entailment** and **contradiction**. 4. **Multi-Label Support**: Can evaluate multiple labels simultaneously, ideal for **degrees of truthfulness**. **Process:** 1. Input the **news** as the claim and the **context** as the hypothesis. 2. Compute a **truthfulness score** between **0 and 1**, where: - **0**: Completely **false**. - **1**: Completely **true**. 3. Generate **explanations** based on the score and suggest actions (e.g., further verification if uncertain). --- ### **End-to-End Example** **Input News:** "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment." **Context Built:** - **FAISS Search:** Finds prior research on **quantum time reversal** and **entanglement theories**. - **Online Search:** Retrieves recent articles discussing **quantum breakthroughs** and expert views. **Model Evaluation:** - Model compares the news with the combined context and outputs: **Score: 0.72** (Likely True). **Result Explanation:** ```plaintext News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment." Truthfulness Score: 0.72 (Likely true) Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions. ``` --- ### **Why This Approach Works?** 1. **Balanced Context**: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search). 2. **Model Flexibility**: Zero-shot model adapts to diverse topics without retraining. 3. **Scalable and Cost-Effective**: Uses pre-trained models, FAISS indexing, and simple APIs for implementation. 4. **Interpretability**: Outputs include confidence scores and explanations for transparency. This modular approach ensures that the **truthfulness assessment** is **scalable**, **explainable**, and **adaptable** to new domains. --- ## Jupyter Notebook: `workflow.ipynb` The `workflow.ipynb` file provides a detailed, step-by-step demonstration of the entire EchoTruth process, from fetching online search results and knowledge graph context to calculating the truthfulness score and generating explanations. The notebook includes examples of both true and fake news, and shows how the application processes these inputs. To view the workflow: 1. Open the Jupyter notebook. 2. Follow the cells to see how the system fetches data and evaluates news authenticity for both true and fake news examples.