Trained on 2.7M samples across 4,803 generators (see Training Data)

Uploaded for community validation as part of OpenSight - An upcoming open-source framework for adaptive deepfake detection, inspired by methodologies in .

Huggingface Spaces coming soon.

Model Details

Model Description

Vision Transformer (ViT) model trained on the largest dataset to-date for detecting AI-generated images in forensic applications.

  • Developed by: Jeongsoo Park and Andrew Owens, University of Michigan
  • Model type: Vision Transformer (ViT-Small)
  • License: MIT (compatible with CreativeML OpenRAIL-M referenced in [2411.04125v1.pdf])
  • Finetuned from: timm/vit_small_patch16_384.augreg_in21k_ft_in1k

Model Sources

Uses

Direct Use

Detect AI-generated images in:

  • Content moderation pipelines
  • Digital forensic investigations

Bias, Risks, and Limitations

  • Performance variance: Accuracy drops 15-20% on diffusion-generated images vs GAN-generated
  • Geometric artifacts: Struggles with rotated/flipped synthetic images
  • Data bias: Trained primarily on LAION and COCO derivatives ([source][2411.04125v1.pdf])
  • ADDED BY UPLOADER: Model is already out of date, fails to detect images on newer generation models.

Compatibility Notice

This repository contains a Hugging Face transformers-compatible convert for the original detection methodology from:

Original Work
"Community Forensics: Using Thousands of Generators to Train Fake Image Detectors"
arXiv:2411.04125 {{Citation from 2411.04125v1.pdf}}

Our Contributions (Coming soon) ⎯ Conversion of original weights to HF format
⎯ Added PyTorch inference pipeline
⎯ Standardized model card documentation

No Training Performed
⎯ Initial model weights sourced from paper authors
⎯ No architectural changes or fine-tuning applied

Verify Original Performance
Please refer to Table 3 in for baseline metrics.

How to Use

from transformers import ViTImageProcessor, ViTForImageClassification

processor = ViTImageProcessor.from_pretrained("[your_model_id]")
model = ViTForImageClassification.from_pretrained("[your_model_id]")

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1)

Training Details

Training Data

  • 2.7mil images from 15+ generators, 4600+ models
  • Over 1.15TB worth of images

Training Hyperparameters

  • Framework: PyTorch 2.0
  • Precision: bf16 mixed
  • Optimizer: AdamW (lr=5e-5)
  • Epochs: 10
  • Batch Size: 32

Evaluation

Testing Data

  • 10k held-out images (5k real/5k synthetic) from unseen Diffusion/GAN models
Metric Value
Accuracy 97.2%
F1 Score 0.968
AUC-ROC 0.992
FP Rate 2.1%

image/png

Citation

BibTeX:

@misc{park2024communityforensics,
    title={Community Forensics: Using Thousands of Generators to Train Fake Image Detectors}, 
    author={Jeongsoo Park and Andrew Owens},
    year={2024},
    eprint={2411.04125},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2411.04125}, 
}

Model Card Authors:

Jeongsoo Park, Andrew Owens

Downloads last month
29
Safetensors
Model size
22M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for aiwithoutborders-xyz/OpenSight-deepfake-community-det

Finetuned
(1)
this model