Lack of generalisability
Have you tried this model with real examples like "There is war between Ukraine and Russia", "Barack Obama was USA president", "Israel is at war in Gaza", "COVID is a virus" etc. I found when testing with real world news statements it makes wrong predictions. Please correct me if I am wrong. Almost it classified given sentences as FAKE. I found the same issue on Kaggle notebooks which claimed more than 97% accuracy which used ROBERTA, BERT and other models and finetuned with LIAR, fake-real etc. datasets.These are some kaggle examples Fake vs Real News Detection | BERT π€ | Acc: 100%, Fake News Detector: EDA & Prediction(99+%), News classification 97%-f1 mlflow pytorch dagshub, Fake-News Cleaning+Word2Vec+LSTM (99% Accuracy), Fake News Classification (Easiest 99% accuracy), True and Fake News || LSTM accuracy:97.90%
You should have to give input as title + context
It will work with real time news articles also.
Can you share 1 or 2 real world examples which you tried which related some recent occurences(Russia-Ukraine war, NEWS related to AI, Quantum computing, Barack Obama etc. or anything which not relates directly to training dataset) and there outputs. Eventhough I tried with title + context, I got most of the times FAKE. I want to understand whether this is issue from my side or lack of generalisability. Also like to know whether you face same issue with the notebooks I mentioned above.
It is working with real time news articles.
Can you please share an example which you tried? Also I suggest you to read this paper Exploring the Generalisability of Fake News Detection Models
N. Hoy and T. Koulouri, "Exploring the Generalisability of Fake News Detection Models," 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 2022, pp. 5731-5740, doi: 10.1109/BigData55660.2022.10020583. keywords: {Uniform resource locators;Voting;Linguistics;Big Data;Feature extraction;Data models;Vaccines;Fake News Detection;Natural Language Processing;Machine Learning;Generalisability},
It will work with real time news articles also.
I actually like 2 see some examples after you tried with this code
Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Pavan48/fake_news_detection_roberta")
model = AutoModelForSequenceClassification.from_pretrained("Pavan48/fake_news_detection_roberta")
Title :
Airports shut in Russia's Kazan after Ukrainian drones hit buildings.
Context;
Russia's Kazan airport has temporarily halted flight arrivals and departures, Russia's aviation watchdog Rosaviatsia said via the Telegram messaging app on Saturday, following a Ukrainian drone attack on the city. Russian state news agencies reported the drone attack on a residential complex in Kazan. The TASS agency said eight drone strikes had been recorded, including six on residential structures. There were no casualties reported, agencies said, citing local authorities. The Russian authorities also claimed to have shot down Ukrainian drones.
Check the above news article
def infer_fake_news(title, text):
# Combine title and text into one string for the model input
combined_text = title + " " + text
inputs = tokenizer(combined_text, return_tensors="pt", padding="max_length", truncation=True, max_length=512)
input_ids = inputs["input_ids"].to(device)
attention_mask = inputs["attention_mask"].to(device)
with torch.no_grad():
outputs = model(input_ids, attention_mask=attention_mask)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=-1).item()
label = "Real" if predicted_class == 1 else "Fake"
return label
Use the above code snippet after loading the model
Can you please check this example
Test the function with a fake news example about aliens
title = "Aliens landed in New York City and declared a global takeover."
context = """Reports suggest that a UFO landed in Central Park, New York City, and a group of extraterrestrial beings stepped out, claiming Earth as their new colony.
Eyewitnesses described the aliens as having glowing green skin and large heads.
Local authorities are said to be in communication with the beings, while videos of the event flood social media.
No government confirmation has been provided yet, fueling widespread panic among residents."""
result = infer_fake_news(title, context)
print("Prediction: ", result)
Output I got: Prediction: Real