Lack of generalisability

by Hansa23 - opened Dec 22, 2024

Dec 22, 2024

Have you tried this model with real examples like "There is war between Ukraine and Russia", "Barack Obama was USA president", "Israel is at war in Gaza", "COVID is a virus" etc. I found when testing with real world news statements it makes wrong predictions. Please correct me if I am wrong. Almost it classified given sentences as FAKE. I found the same issue on Kaggle notebooks which claimed more than 97% accuracy which used ROBERTA, BERT and other models and finetuned with LIAR, fake-real etc. datasets.These are some kaggle examples Fake vs Real News Detection | BERT 🤖 | Acc: 100%, Fake News Detector: EDA & Prediction(99+%), News classification 97%-f1 mlflow pytorch dagshub, Fake-News Cleaning+Word2Vec+LSTM (99% Accuracy), Fake News Classification (Easiest 99% accuracy), True and Fake News || LSTM accuracy:97.90%

Pavan48

Owner Dec 22, 2024

You should have to give input as title + context

Pavan48

Owner Dec 22, 2024

•

edited Dec 22, 2024

It will work with real time news articles also.

Hansa23

Dec 22, 2024

Can you share 1 or 2 real world examples which you tried which related some recent occurences(Russia-Ukraine war, NEWS related to AI, Quantum computing, Barack Obama etc. or anything which not relates directly to training dataset) and there outputs. Eventhough I tried with title + context, I got most of the times FAKE. I want to understand whether this is issue from my side or lack of generalisability. Also like to know whether you face same issue with the notebooks I mentioned above.

Hansa23

Dec 22, 2024

It is working with real time news articles.

Can you please share an example which you tried? Also I suggest you to read this paper Exploring the Generalisability of Fake News Detection Models
N. Hoy and T. Koulouri, "Exploring the Generalisability of Fake News Detection Models," 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 2022, pp. 5731-5740, doi: 10.1109/BigData55660.2022.10020583. keywords: {Uniform resource locators;Voting;Linguistics;Big Data;Feature extraction;Data models;Vaccines;Fake News Detection;Natural Language Processing;Machine Learning;Generalisability},

Hansa23

Dec 22, 2024

It will work with real time news articles also.

I actually like 2 see some examples after you tried with this code

Load model directly

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Pavan48/fake_news_detection_roberta")
model = AutoModelForSequenceClassification.from_pretrained("Pavan48/fake_news_detection_roberta")

Pavan48

Owner Dec 22, 2024

•

edited Dec 22, 2024

Title :
Airports shut in Russia's Kazan after Ukrainian drones hit buildings.
Context;
Russia's Kazan airport has temporarily halted flight arrivals and departures, Russia's aviation watchdog Rosaviatsia said via the Telegram messaging app on Saturday, following a Ukrainian drone attack on the city. Russian state news agencies reported the drone attack on a residential complex in Kazan. The TASS agency said eight drone strikes had been recorded, including six on residential structures. There were no casualties reported, agencies said, citing local authorities. The Russian authorities also claimed to have shot down Ukrainian drones.

Check the above news article

Pavan48

Owner Dec 22, 2024

•

edited Dec 22, 2024

def infer_fake_news(title, text):
# Combine title and text into one string for the model input
combined_text = title + " " + text
inputs = tokenizer(combined_text, return_tensors="pt", padding="max_length", truncation=True, max_length=512)
input_ids = inputs["input_ids"].to(device)
attention_mask = inputs["attention_mask"].to(device)

with torch.no_grad():
    outputs = model(input_ids, attention_mask=attention_mask)

logits = outputs.logits
predicted_class = torch.argmax(logits, dim=-1).item()

label = "Real" if predicted_class == 1 else "Fake"
return label

Use the above code snippet after loading the model

Hansa23

Dec 23, 2024

Can you please check this example

Test the function with a fake news example about aliens

title = "Aliens landed in New York City and declared a global takeover."
context = """Reports suggest that a UFO landed in Central Park, New York City, and a group of extraterrestrial beings stepped out, claiming Earth as their new colony.
Eyewitnesses described the aliens as having glowing green skin and large heads.
Local authorities are said to be in communication with the beings, while videos of the event flood social media.
No government confirmation has been provided yet, fueling widespread panic among residents."""

result = infer_fake_news(title, context)
print("Prediction: ", result)

Output I got: Prediction: Real

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment