|
--- |
|
license: cc-by-nc-sa-4.0 |
|
--- |
|
|
|
# Model Card for Direct Discriminative Optimization (ICML 2025 Spotlight) |
|
|
|
[**Paper**](https://arxiv.org/abs/2503.01103) | [**Website**](https://research.nvidia.com/labs/dir/ddo) | [**Code**](https://github.com/NVlabs/DDO) |
|
|
|
# Model Overview |
|
|
|
## Description: |
|
**Direct Discriminative Optimization (DDO)** provides visual generative models finetuned from previous state-of-the-art **diffusion models/visual autoregressive models**, including [EDM](https://github.com/NVlabs/edm)/[EDM2](https://github.com/NVlabs/edm2)/[VAR](https://github.com/FoundationVision/VAR) on CIFAR-10/ImageNet-64/ImageNet-512/ImageNet-256 datasets, while **significantly advancing their original generation quality**. |
|
|
|
The models are ready for non-commercial use. |
|
|
|
## Model Developer |
|
Base model weights are from Nvidia/Bytedance. Finetuned by Nvidia. |
|
|
|
## Model Versions: |
|
- **EDM-based models**: `edm-cifar10-uncond-vp-ddo.pkl`, `edm-cifar10-cond-vp-ddo.pkl` |
|
- **EDM2-based models**: `edm2-img64-s-ddo.pkl`, `edm2-img512-l-ddo.pkl` |
|
- **VAR-based models**: `var_d16-ddo.pth`, `var_d30-ddo.pth` |
|
|
|
## License/Terms of Use: |
|
All materials, including source code and released models, are licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/). |
|
|
|
EDM base models were originally shared under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://github.com/NVlabs/edm/blob/main/LICENSE.txt). |
|
|
|
EDM2 base models were originally shared under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://github.com/NVlabs/edm2/blob/main/LICENSE.txt). |
|
|
|
VAR base models were originally shared under the [MIT License](https://github.com/FoundationVision/VAR/blob/main/LICENSE). |
|
|
|
## Usage: |
|
The models are used for image generation on academic benchmarks. We provide inference scripts in the [Github repository](https://github.com/NVlabs/DDO). |
|
|
|
## Software Integration: |
|
|
|
**Supported Hardware Microarchitecture Compatibility:** |
|
* NVIDIA Ampere (e.g., A100) |
|
* NVIDIA Hopper (e.g., H100) |
|
|
|
Note: We have only tested the model inference with FP16 precision on Ampere and Hopper GPUs. Our models should be compatible with older versions of NVIDIA GPUs (e.g., NVIDIA Volta GPUs) that support FP16. |
|
|
|
**Operating System(s):** |
|
* Linux (We have not tested on other operating systems.) |
|
|
|
## Ethical Considerations: |
|
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. |
|
|
|
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). |
|
|
|
## Citation |
|
|
|
``` |
|
@article{zheng2025direct, |
|
title={Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator}, |
|
author={Zheng, Kaiwen and Chen, Yongxin and Chen, Huayu and He, Guande and Liu, Ming-Yu and Zhu, Jun and Zhang, Qinsheng}, |
|
journal={arXiv preprint arXiv:2503.01103}, |
|
year={2025} |
|
} |
|
``` |
|
|