Trained on 2.7M samples across 4,803 generators (see Training Data)

Uploaded for community validation as part of OpenSight - An upcoming open-source framework for adaptive deepfake detection, inspired by methodologies in .

Huggingface Spaces coming soon.

Model Details

Model Description

Vision Transformer (ViT) model trained on the largest dataset to-date for detecting AI-generated images in forensic applications.

Developed by: Jeongsoo Park and Andrew Owens, University of Michigan
Model type: Vision Transformer (ViT-Small)
License: MIT (compatible with CreativeML OpenRAIL-M referenced in [2411.04125v1.pdf])
Finetuned from: timm/vit_small_patch16_384.augreg_in21k_ft_in1k

Model Sources

Repository: JeongsooP/Community-Forensics
Paper: arXiv:2411.04125

Uses

Direct Use

Detect AI-generated images in:

Content moderation pipelines
Digital forensic investigations

Bias, Risks, and Limitations

Performance variance: Accuracy drops 15-20% on diffusion-generated images vs GAN-generated
Geometric artifacts: Struggles with rotated/flipped synthetic images
Data bias: Trained primarily on LAION and COCO derivatives ([source][2411.04125v1.pdf])
ADDED BY UPLOADER: Model is already out of date, fails to detect images on newer generation models.

Compatibility Notice

This repository contains a Hugging Face transformers-compatible convert for the original detection methodology from:

Original Work
"Community Forensics: Using Thousands of Generators to Train Fake Image Detectors"
arXiv:2411.04125 {{Citation from 2411.04125v1.pdf}}

Our Contributions (Coming soon) ⎯ Conversion of original weights to HF format
⎯ Added PyTorch inference pipeline
⎯ Standardized model card documentation

No Training Performed
⎯ Initial model weights sourced from paper authors
⎯ No architectural changes or fine-tuning applied

Verify Original Performance
Please refer to Table 3 in for baseline metrics.

How to Use

from transformers import ViTImageProcessor, ViTForImageClassification

processor = ViTImageProcessor.from_pretrained("[your_model_id]")
model = ViTForImageClassification.from_pretrained("[your_model_id]")

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1)

Training Details

Training Data

2.7mil images from 15+ generators, 4600+ models
Over 1.15TB worth of images

Training Hyperparameters

Framework: PyTorch 2.0
Precision: bf16 mixed
Optimizer: AdamW (lr=5e-5)
Epochs: 10
Batch Size: 32

Evaluation

Testing Data

10k held-out images (5k real/5k synthetic) from unseen Diffusion/GAN models

Metric	Value
Accuracy	97.2%
F1 Score	0.968
AUC-ROC	0.992
FP Rate	2.1%

Citation

BibTeX:

@misc{park2024communityforensics,
    title={Community Forensics: Using Thousands of Generators to Train Fake Image Detectors}, 
    author={Jeongsoo Park and Andrew Owens},
    year={2024},
    eprint={2411.04125},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2411.04125}, 
}

Model Card Authors:

Jeongsoo Park, Andrew Owens

aiwithoutborders-xyz
/

OpenSight-deepfake-community-det