lightmate commited on
Commit
81f2695
·
verified ·
1 Parent(s): 1190b8e

added config in readme

Browse files
Files changed (1) hide show
  1. README.md +113 -101
README.md CHANGED
@@ -1,102 +1,114 @@
1
- ### **Implementation Steps: Validating Information with Context**
2
-
3
- Validating the accuracy or degree of truthfulness of a given piece of information requires **context**—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step:
4
-
5
- ---
6
-
7
- ### **Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search**
8
- Instead of relying on a **traditional Knowledge Graph (KG)**, we use **FAISS (Facebook AI Similarity Search)**, a **faster, scalable, and flexible alternative** for semantic search.
9
-
10
- #### **Why FAISS is Better than a Traditional KG**
11
- 1. **Sentence-Level Retrieval**: Unlike traditional KGs that often rely on pre-defined **entities and relationships**, FAISS uses dense **embeddings** to directly match the **semantic meaning** of entire sentences.
12
- 2. **Scalable and High-Speed Retrieval**: FAISS efficiently handles **millions of embeddings**, making it highly scalable for real-world applications.
13
- 3. **Flexibility**: It works with **unstructured text**, removing the need to pre-process information into entities and relations, which is often time-consuming.
14
- 4. **Generalization**: FAISS enables **approximate nearest neighbor (ANN) search**, allowing retrieval of contextually related results, even if they are not exact matches.
15
-
16
- #### **Dataset Used**
17
- We leverage the **News Category Dataset** ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains **news headlines and short descriptions** across various categories.
18
-
19
- - **Why This Dataset?**
20
- It covers a **wide range of topics**, making it useful for general-purpose context building.
21
- - Headlines and descriptions provide **rich semantic embeddings** for similarity searches.
22
- - Categories allow filtering relevant results if required (e.g., "science" or "technology").
23
-
24
- **Process:**
25
- 1. We use **SentenceTransformer (all-MiniLM-L6-v2)** to generate embeddings for the query (the input news).
26
- 2. We search against pre-computed embeddings stored in a **FAISS index** to retrieve the **top-K most relevant entries**.
27
- 3. These results form the **initial context**, capturing related information already present in the dataset.
28
-
29
- ---
30
-
31
- ### **Step 2: Online Search for Real-Time Context**
32
- To **augment** the context retrieved from FAISS, we incorporate **real-time online search** using an API.
33
-
34
- #### **Why Online Search is Critical?**
35
- - **Fresh Information**: News and facts evolve, especially in areas like **science, technology, or politics**. Online search ensures access to the **latest updates** that may not exist in the static dataset.
36
- - **Diverse Sources**: It broadens the scope by pulling information from **multiple credible sources**, reducing bias and enhancing reliability.
37
- - **Fact-Checking**: Search engines often index **trusted fact-checking websites** that we can incorporate into the context.
38
-
39
- **Process:**
40
- 1. Use an API with a **search query** derived from the input news.
41
- 2. Retrieve relevant snippets, headlines, or summaries.
42
- 3. Append these results to the **context** built using FAISS.
43
-
44
- ---
45
-
46
- ### **Step 3: Building Context from Combined Sources**
47
- Both FAISS-based retrieval and **online search results** are combined into a **single context string**. This provides a **comprehensive knowledge base** around the input information.
48
-
49
- - **Why Combine Both?**
50
- - FAISS offers **pre-indexed knowledge**—ideal for **static facts** or concepts.
51
- - Online search complements it with **dynamic and up-to-date insights**—perfect for verifying **recent developments**.
52
-
53
- This layered context improves the model’s ability to assess the **truthfulness** of the given information.
54
-
55
- ---
56
-
57
- ### **Step 4: Truthfulness Prediction with Zero-Shot Classification Model**
58
- We use the **Facebook/BART-Large-MNLI** model, a **zero-shot classification** model, for evaluation.
59
-
60
- #### **Why BART-Large-MNLI?**
61
- 1. **Zero-Shot Capability**: It can handle claims and hypotheses without needing **task-specific training**—perfect for this flexible, multi-domain use case.
62
- 2. **Contextual Matching**: It compares the input claim (news) with the constructed context to assess **semantic consistency**.
63
- 3. **High Accuracy**: Pre-trained on **natural language inference tasks**, making it adept at understanding relationships like **entailment** and **contradiction**.
64
- 4. **Multi-Label Support**: Can evaluate multiple labels simultaneously, ideal for **degrees of truthfulness**.
65
-
66
- **Process:**
67
- 1. Input the **news** as the claim and the **context** as the hypothesis.
68
- 2. Compute a **truthfulness score** between **0 and 1**, where:
69
- - **0**: Completely **false**.
70
- - **1**: Completely **true**.
71
- 3. Generate **explanations** based on the score and suggest actions (e.g., further verification if uncertain).
72
-
73
- ---
74
-
75
- ### **End-to-End Example**
76
- **Input News:**
77
- "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
78
-
79
- **Context Built:**
80
- - **FAISS Search:** Finds prior research on **quantum time reversal** and **entanglement theories**.
81
- - **Online Search:** Retrieves recent articles discussing **quantum breakthroughs** and expert views.
82
-
83
- **Model Evaluation:**
84
- - Model compares the news with the combined context and outputs:
85
- **Score: 0.72** (Likely True).
86
-
87
- **Result Explanation:**
88
- ```plaintext
89
- News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
90
- Truthfulness Score: 0.72 (Likely true)
91
- Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.
92
- ```
93
-
94
- ---
95
-
96
- ### **Why This Approach Works?**
97
- 1. **Balanced Context**: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search).
98
- 2. **Model Flexibility**: Zero-shot model adapts to diverse topics without retraining.
99
- 3. **Scalable and Cost-Effective**: Uses pre-trained models, FAISS indexing, and simple APIs for implementation.
100
- 4. **Interpretability**: Outputs include confidence scores and explanations for transparency.
101
-
 
 
 
 
 
 
 
 
 
 
 
 
102
  This modular approach ensures that the **truthfulness assessment** is **scalable**, **explainable**, and **adaptable** to new domains.
 
1
+ ---
2
+ title: Truthfulness Checker
3
+ emoji: 📰
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 5.4.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ ---
12
+
13
+ ### **Implementation Steps: Validating Information with Context**
14
+
15
+ Validating the accuracy or degree of truthfulness of a given piece of information requires **context**—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step:
16
+
17
+ ---
18
+
19
+ ### **Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search**
20
+ Instead of relying on a **traditional Knowledge Graph (KG)**, we use **FAISS (Facebook AI Similarity Search)**, a **faster, scalable, and flexible alternative** for semantic search.
21
+
22
+ #### **Why FAISS is Better than a Traditional KG**
23
+ 1. **Sentence-Level Retrieval**: Unlike traditional KGs that often rely on pre-defined **entities and relationships**, FAISS uses dense **embeddings** to directly match the **semantic meaning** of entire sentences.
24
+ 2. **Scalable and High-Speed Retrieval**: FAISS efficiently handles **millions of embeddings**, making it highly scalable for real-world applications.
25
+ 3. **Flexibility**: It works with **unstructured text**, removing the need to pre-process information into entities and relations, which is often time-consuming.
26
+ 4. **Generalization**: FAISS enables **approximate nearest neighbor (ANN) search**, allowing retrieval of contextually related results, even if they are not exact matches.
27
+
28
+ #### **Dataset Used**
29
+ We leverage the **News Category Dataset** ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains **news headlines and short descriptions** across various categories.
30
+
31
+ - **Why This Dataset?**
32
+ It covers a **wide range of topics**, making it useful for general-purpose context building.
33
+ - Headlines and descriptions provide **rich semantic embeddings** for similarity searches.
34
+ - Categories allow filtering relevant results if required (e.g., "science" or "technology").
35
+
36
+ **Process:**
37
+ 1. We use **SentenceTransformer (all-MiniLM-L6-v2)** to generate embeddings for the query (the input news).
38
+ 2. We search against pre-computed embeddings stored in a **FAISS index** to retrieve the **top-K most relevant entries**.
39
+ 3. These results form the **initial context**, capturing related information already present in the dataset.
40
+
41
+ ---
42
+
43
+ ### **Step 2: Online Search for Real-Time Context**
44
+ To **augment** the context retrieved from FAISS, we incorporate **real-time online search** using an API.
45
+
46
+ #### **Why Online Search is Critical?**
47
+ - **Fresh Information**: News and facts evolve, especially in areas like **science, technology, or politics**. Online search ensures access to the **latest updates** that may not exist in the static dataset.
48
+ - **Diverse Sources**: It broadens the scope by pulling information from **multiple credible sources**, reducing bias and enhancing reliability.
49
+ - **Fact-Checking**: Search engines often index **trusted fact-checking websites** that we can incorporate into the context.
50
+
51
+ **Process:**
52
+ 1. Use an API with a **search query** derived from the input news.
53
+ 2. Retrieve relevant snippets, headlines, or summaries.
54
+ 3. Append these results to the **context** built using FAISS.
55
+
56
+ ---
57
+
58
+ ### **Step 3: Building Context from Combined Sources**
59
+ Both FAISS-based retrieval and **online search results** are combined into a **single context string**. This provides a **comprehensive knowledge base** around the input information.
60
+
61
+ - **Why Combine Both?**
62
+ - FAISS offers **pre-indexed knowledge**—ideal for **static facts** or concepts.
63
+ - Online search complements it with **dynamic and up-to-date insights**—perfect for verifying **recent developments**.
64
+
65
+ This layered context improves the model’s ability to assess the **truthfulness** of the given information.
66
+
67
+ ---
68
+
69
+ ### **Step 4: Truthfulness Prediction with Zero-Shot Classification Model**
70
+ We use the **Facebook/BART-Large-MNLI** model, a **zero-shot classification** model, for evaluation.
71
+
72
+ #### **Why BART-Large-MNLI?**
73
+ 1. **Zero-Shot Capability**: It can handle claims and hypotheses without needing **task-specific training**—perfect for this flexible, multi-domain use case.
74
+ 2. **Contextual Matching**: It compares the input claim (news) with the constructed context to assess **semantic consistency**.
75
+ 3. **High Accuracy**: Pre-trained on **natural language inference tasks**, making it adept at understanding relationships like **entailment** and **contradiction**.
76
+ 4. **Multi-Label Support**: Can evaluate multiple labels simultaneously, ideal for **degrees of truthfulness**.
77
+
78
+ **Process:**
79
+ 1. Input the **news** as the claim and the **context** as the hypothesis.
80
+ 2. Compute a **truthfulness score** between **0 and 1**, where:
81
+ - **0**: Completely **false**.
82
+ - **1**: Completely **true**.
83
+ 3. Generate **explanations** based on the score and suggest actions (e.g., further verification if uncertain).
84
+
85
+ ---
86
+
87
+ ### **End-to-End Example**
88
+ **Input News:**
89
+ "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
90
+
91
+ **Context Built:**
92
+ - **FAISS Search:** Finds prior research on **quantum time reversal** and **entanglement theories**.
93
+ - **Online Search:** Retrieves recent articles discussing **quantum breakthroughs** and expert views.
94
+
95
+ **Model Evaluation:**
96
+ - Model compares the news with the combined context and outputs:
97
+ **Score: 0.72** (Likely True).
98
+
99
+ **Result Explanation:**
100
+ ```plaintext
101
+ News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
102
+ Truthfulness Score: 0.72 (Likely true)
103
+ Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.
104
+ ```
105
+
106
+ ---
107
+
108
+ ### **Why This Approach Works?**
109
+ 1. **Balanced Context**: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search).
110
+ 2. **Model Flexibility**: Zero-shot model adapts to diverse topics without retraining.
111
+ 3. **Scalable and Cost-Effective**: Uses pre-trained models, FAISS indexing, and simple APIs for implementation.
112
+ 4. **Interpretability**: Outputs include confidence scores and explanations for transparency.
113
+
114
  This modular approach ensures that the **truthfulness assessment** is **scalable**, **explainable**, and **adaptable** to new domains.