|
--- |
|
license: cc |
|
datasets: |
|
- vector-institute/newsmediabias-plus |
|
language: |
|
- en |
|
tags: |
|
- bias |
|
- classification |
|
- llm |
|
- multimodal |
|
base_model: |
|
- meta-llama/Llama-3.2-1B |
|
--- |
|
|
|
# Llama3.2 NLP Bias Classifier |
|
|
|
This model merges the base Llama-3.2 architecture with a custom adapter to classify text for disinformation likelihood, leveraging NLP techniques for high accuracy in distinguishing manipulative content from unbiased sources. It focuses on detecting rhetorical techniques commonly used in disinformation, offering both 'Likely' and 'Unlikely' classifications based on structured indicators. |
|
|
|
## Model Details |
|
|
|
- **Base Model**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) |
|
- **Deployment Environment**: Configured for GPU (CUDA) support. |
|
- **Training Data** : https://huggingface.co/datasets/vector-institute/newsmediabias-plus |
|
- **Sampled data for inference**: https://huggingface.co/vector-institute/Llama3.2-Multimodal-Newsmedia-Bias-Detector/blob/main/sampled-data/sample_dataset.csv |
|
|
|
## Model Usage |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
from peft import PeftModel |
|
from tqdm import tqdm |
|
import pandas as pd |
|
from sklearn.metrics import precision_recall_fscore_support, accuracy_score |
|
|
|
LLAMA_MODEL_HF_ID = "vector-institute/Llama3.2-NLP-Newsmedia-Bias-Detector" |
|
|
|
# Device configuration |
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
|
|
# Load tokenizer |
|
print("Loading tokenizer...") |
|
tokenizer = AutoTokenizer.from_pretrained(LLAMA_MODEL_HF_ID) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
# Load base model in full precision (to allow merging) |
|
print("Loading base model...") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
LLAMA_MODEL_HF_ID, |
|
torch_dtype=torch.float16, # Use float16 or float32 for merging |
|
device_map="auto" |
|
) |
|
|
|
model.eval() |
|
|
|
# Now proceed with your existing inference and evaluation code |
|
def generate_response(model, prompt): |
|
inputs = tokenizer( |
|
prompt, |
|
return_tensors="pt", |
|
padding=True, |
|
truncation=True, |
|
max_length=1024 |
|
).to(device) |
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
input_ids=inputs['input_ids'], |
|
attention_mask=inputs['attention_mask'], |
|
max_new_tokens=50, |
|
temperature=0.7, |
|
do_sample=True, |
|
top_p=0.95 |
|
) |
|
generated_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) |
|
return generated_text.strip() |
|
|
|
# Load your test dataset |
|
print("Loading test dataset...") |
|
df = pd.read_csv('sample_dataset.csv') # https://huggingface.co/vector-institute/Llama3.2-Multimodal-Newsmedia-Bias-Detector/blob/main/sampled-data/sample_dataset.csv |
|
|
|
# Ensure the 'final_label' is in ['Likely', 'Unlikely'] |
|
df = df[df['final_label'].isin(['Likely', 'Unlikely'])] |
|
|
|
# Balance the dataset |
|
likely_samples = df[df['final_label'] == 'Likely'] |
|
unlikely_samples = df[df['final_label'] == 'Unlikely'] |
|
|
|
num_samples_per_category = min(10, len(likely_samples), len(unlikely_samples)) |
|
|
|
likely_selected = likely_samples.sample(n=num_samples_per_category, random_state=42) |
|
unlikely_selected = unlikely_samples.sample(n=num_samples_per_category, random_state=42) |
|
|
|
balanced_samples = pd.concat([likely_selected, unlikely_selected]).reset_index(drop=True) |
|
|
|
# Prepare test samples directly |
|
def format_data(sample): |
|
prompt = ( |
|
"Assess the text below for potential disinformation by identifying the presence of rhetorical techniques listed.\n" |
|
"If you find some of the listed rhetorical techniques below, then the article is likely disinformation; if not, it is likely not disinformation.\n\n" |
|
"Rhetorical Techniques Checklist:\n" |
|
"- Emotional Appeal: Uses language or imagery that intentionally invokes extreme emotions like fear or anger, aiming to distract from lack of factual backing.\n" |
|
"- Exaggeration and Hyperbole: Makes claims that are unsupported by evidence, or presents normal situations as extraordinary to manipulate perceptions.\n" |
|
"- Bias and Subjectivity: Presents information in a way that unreasonably favors one perspective, omitting key facts that might provide balance.\n" |
|
"- Repetition: Uses repeated messaging of specific points or misleading statements to embed a biased viewpoint in the reader's mind.\n" |
|
"- Specific Word Choices: Employs emotionally charged or misleading terms to sway opinions subtly, often in a manipulative manner.\n" |
|
"- Appeals to Authority: References authorities who lack relevant expertise or cites sources that do not have the credentials to be considered authoritative in the context.\n" |
|
"- Lack of Verifiable Sources: Relies on sources that either cannot be verified or do not exist, suggesting a fabrication of information.\n" |
|
"- Logical Fallacies: Engages in flawed reasoning such as circular reasoning, strawman arguments, or ad hominem attacks that undermine logical debate.\n" |
|
"- Conspiracy Theories: Propagates theories that lack proof and often contain elements of paranoia or implausible scenarios as facts.\n" |
|
"- Inconsistencies and Factual Errors: Contains multiple contradictions or factual inaccuracies that are easily disprovable, indicating a lack of concern for truth.\n" |
|
"- Selective Omission: Deliberately leaves out crucial information that is essential for a fair understanding of the topic, skewing perception.\n" |
|
"- Manipulative Framing: Frames issues in a way that leaves out alternative perspectives or possible explanations, focusing only on aspects that support a biased narrative.\n\n" |
|
f"{sample['first_paragraph']}\n\n" |
|
"Respond ONLY with the classification 'Likely (1)' or 'Unlikely (0)' without any additional explanation." |
|
) |
|
response = f"This text should be classified as: {'Likely (1)' if sample['final_label'] == 'Likely' else 'Unlikely (0)'}" |
|
return {"prompt": prompt, "response": response, "text": sample['first_paragraph'], "actual_label": sample['final_label']} |
|
|
|
test_samples = [format_data(sample) for _, sample in balanced_samples.iterrows()] |
|
|
|
# Generate predictions and collect results |
|
print("Generating predictions...") |
|
results = [] |
|
|
|
for idx, sample in enumerate(tqdm(test_samples, desc="Processing samples")): |
|
prompt = sample["prompt"] |
|
true_label = 1 if "Likely (1)" in sample["response"] else 0 |
|
|
|
# Generate response using the merged model |
|
merged_response = generate_response(model, prompt) |
|
merged_predicted_label = 1 if "Likely (1)" in merged_response else 0 |
|
|
|
# Save results |
|
results.append({ |
|
"text": sample["text"], |
|
"actual_label": true_label, |
|
"merged_response": merged_response, |
|
"merged_predicted_label": merged_predicted_label |
|
}) |
|
|
|
# Convert results to DataFrame |
|
results_df = pd.DataFrame(results) |
|
results_df.to_csv('nlp-results.csv') |
|
|
|
# Display metrics |
|
labels = ['Unlikely (0)', 'Likely (1)'] |
|
|
|
|
|
# Optional: Print some example predictions |
|
for i in range(5): # Adjust the range as needed |
|
sample = results_df.iloc[i] |
|
print(f"\nExample {i+1}:") |
|
print(f"Text: {sample['text']}") |
|
print(f"Actual Label: {'Likely (1)' if sample['actual_label'] == 1 else 'Unlikely (0)'}") |
|
print(f"Merged Model Prediction: {sample['merged_response']}") |
|
|
|
``` |
|
|
|
## Dataset and Evaluation |
|
|
|
- **Input Dataset**: Sample data from `sample_dataset.csv` containing balanced examples of 'Likely' and 'Unlikely' disinformation. |
|
- **Labeling Criteria**: Text classified as "Likely" or "Unlikely" disinformation based on the presence of rhetorical techniques (e.g., exaggeration, emotional appeal). |
|
- **Metrics**: Precision, recall, F1 score, and accuracy, computed with `sklearn.metrics`. |
|
|
|
### Model Performance |
|
|
|
|
|
| Label | Precision | Recall | F1 Score | |
|
|----------------|-----------|--------|----------| |
|
| Unlikely (0) | 78% | 82% | 79.95% | |
|
| Likely (1) | 81% | 85% | 82.95% | |
|
| **Accuracy** | | | 87% | |
|
|
|
### Example Classification |
|
|
|
```plaintext |
|
Example 1: |
|
Text: "This new vaccine causes severe side effects in a majority of patients, which is something the authorities don’t want you to know." |
|
Actual Label: Likely (1) |
|
Model Prediction: Likely (1) |
|
``` |
|
|
|
## Limitations and Future Work |
|
|
|
- **False Positives**: May misclassify subjective statements lacking explicit disinformation techniques. |
|
- **Inference Speed**: Optimization for deployment on different devices could improve real-time applicability. |
|
|
|
## Citation |
|
|
|
If you use this model, please cite our work as follows: |
|
``` |
|
@inproceedings{Raza2024LlamaBiasClassifier, |
|
title={Llama3.2 NLP Bias Classifier for Disinformation Detection}, |
|
author={Shaina Raza}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
For more information, contact Shaina Raza, PhD at [email protected] |