Model Card

SNIFFER is a multimodal large language model specifically engineered for Out-Of-Context misinformation detection and explanation. It employs two-stage instruction tuning on InstructBLIP, including news-domain alignment and task-specific tuning.

The whole model is composed of three parts: 1) internal checking that analyzes the consistency of the image and text content; 2) external checking that analyzes the relevance between the context of the retrieved image and the provided text, and 3) composed reasoning that combines the two-pronged analysis to arrive at a final judgment and explanation.

Here the checkpoint is used for the internal checking part.

Model Sources

Paper: https://arxiv.org/abs/2403.03170 (to be appear in CVPR 2024)
Project: https://pengqi.site/Sniffer/
Repository: https://github.com/MischaQI/Sniffer

Results

Dataset: NewsCLIPpings

Model	All	Fake	Real
SAFE	52.8	54.8	52.0
EANN	58.1	61.8	56.2
VisualBERT	58.6	38.9	78.4
CLIP	66.0	64.3	67.7
DT-Transformer	77.1	78.6	75.6
CCN	84.7	84.8	84.5
Neu-Sym detector	68.2	-	-
SNIFFER (ours)	88.4	86.9	91.8

Citation

@inproceedings{qi2023sniffer,
  author      = {Qi, Peng and Yan, Zehong and Hsu, Wynne and Lee, Mong Li},
  title       = {SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection},
  booktitle   = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year        = {2024}
}

MischaQI
/

SNIFFER

You need to agree to share your contact information to access this model

Model Card

Model Sources

Results

Citation