--- license: apache-2.0 language: - en metrics: - accuracy library_name: transformers tags: - misinformation - fake news - vlm - mllm - llm --- # Model Card SNIFFER is a multimodal large language model specifically engineered for Out-Of-Context misinformation detection and explanation. It employs two-stage instruction tuning on [InstructBLIP](https://huggingface.co/Salesforce/instructblip-vicuna-13b), including news-domain alignment and task-specific tuning. The whole model is composed of three parts: 1) _internal checking_ that analyzes the consistency of the image and text content; 2) _external checking_ that analyzes the relevance between the context of the retrieved image and the provided text, and 3) _composed reasoning_ that combines the two-pronged analysis to arrive at a final judgment and explanation. Here the checkpoint is used for the _internal checking_ part. ## Model Sources - **Paper:** https://arxiv.org/abs/2403.03170 (to be appear in CVPR 2024) - **Project:** https://pengqi.site/Sniffer/ - **Repository:** https://github.com/MischaQI/Sniffer ## Results Dataset: [NewsCLIPpings](https://github.com/g-luo/news_clippings)

| Model | All | Fake | Real | | :-------------------- | :----| :----| :----| | SAFE | 52.8 | 54.8 | 52.0 | | EANN | 58.1 | 61.8 | 56.2 | | VisualBERT | 58.6 | 38.9 | 78.4 | | CLIP | 66.0 | 64.3 | 67.7| | DT-Transformer | 77.1 | 78.6 | 75.6 | | CCN | 84.7 | 84.8 | 84.5 | | Neu-Sym detector | 68.2 | - | - | | **SNIFFER (ours)** | **88.4** | **86.9** | **91.8** | ## Citation ``` @inproceedings{qi2023sniffer, author = {Qi, Peng and Yan, Zehong and Hsu, Wynne and Lee, Mong Li}, title = {SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2024} } ```