File size: 2,707 Bytes

2570e4e
b4efb25
 
 
73f5f27
 
 
b4efb25
 
 
7cf9278
 
 
b4efb25
 
e68f079
b4efb25
0e28b67
 
b4efb25
9309ac4
8635999
9309ac4
e888dd5
 
8635999
b4efb25
 
8635999
b4efb25
8635999
2e76bb1
 
 
9309ac4
b4efb25
9309ac4
 
 
14bff6c
 
b4efb25
 
 
14bff6c
 
b4efb25
2e76bb1
 
 
 
 
 
b4efb25
 
9309ac4
 
2e76bb1
 
 
 
 
 
 
 
14bff6c
2e76bb1
9309ac4
 
 
 
2e76bb1
b4efb25
2e76bb1
 
 
 
 
 
 
 
 
 
9309ac4

---
license: mit
pipeline_tag: image-classification
tags:
- image-classification
- timm
- transformers
- detection
- deepfake
- forensics
- deepfake_detection
- community
- opensight
base_model:
- timm/vit_small_patch16_384.augreg_in21k_ft_in1k
library_name: transformers
---

# Trained on 2.7M samples across 4,803 generators (see Training Data)

**Uploaded for community validation as part of OpenSight** - An upcoming open-source framework for adaptive deepfake detection.

**Project OpenSight HF Spaces coming soon with an eval playground and eventually a leaderboard. Preview:** 

![image/png](https://cdn-uploads.huggingface.co/production/uploads/639daf827270667011153fbc/AUmW697OefKN83BClM1ae.png)

## Model Details
### Model Description
Vision Transformer (ViT) model trained on the largest dataset to-date for detecting AI-generated images in forensic applications.

- **Developed by:** Jeongsoo Park and Andrew Owens, University of Michigan
- **Model type:** Vision Transformer (ViT-Small)
- **License:** MIT (compatible with CreativeML OpenRAIL-M referenced in [2411.04125v1.pdf])
- **Finetuned from:** timm/vit_small_patch16_384.augreg_in21k_ft_in1k
- **Adapted for HF** inference compatibility by AI Without Borders.

**HF Space will be open sourced shortly showcasing various ways to run ultra-fast inference. Make sure to follow us for updates, as we will be releasing a slew of projects in the coming weeks.**

### Links
- **Repository:** [JeongsooP/Community-Forensics](https://github.com/JeongsooP/Community-Forensics)
- **Paper:** [arXiv:2411.04125](https://arxiv.org/pdf/2411.04125)

## Training Details
### Training Data
- 2.7mil images from 15+ generators, 4600+ models
- Over 1.15TB worth of images

### Training Hyperparameters
- **Framework:** PyTorch 2.0
- **Precision:** bf16 mixed
- **Optimizer:** AdamW (lr=5e-5)
- **Epochs:** 10
- **Batch Size:** 32

## Evaluation
### Unverified Testing Results
- Only unverified because we currently lack resources to evaluate a dataset over 1.4T large.

| Metric        | Value |
|---------------|-------|
| Accuracy      | 97.2% |
| F1 Score      | 0.968 |
| AUC-ROC       | 0.992 |
| FP Rate       | 2.1%  |

![image/png](https://cdn-uploads.huggingface.co/production/uploads/639daf827270667011153fbc/g-dLzxLBw1RAuiplvFCxh.png)

## Re-sampled and refined dataset

- **Coming soon™**

## Citation
**BibTeX:**
```bibtex
@misc{park2024communityforensics,
    title={Community Forensics: Using Thousands of Generators to Train Fake Image Detectors}, 
    author={Jeongsoo Park and Andrew Owens},
    year={2024},
    eprint={2411.04125},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2411.04125}, 
}
```