license: apache-2.0 | |
datasets: | |
- ILSVRC/imagenet-1k | |
model-index: | |
- name: VQGAN+ | |
results: | |
- task: | |
type: image-generation | |
dataset: | |
name: ILSVRC/imagenet-1k | |
type: ILSVRC/imagenet-1k | |
metrics: | |
- name: rFID | |
type: rFID | |
value: 1.39 | |
- name: InceptionScore | |
type: InceptionScore | |
value: 193.9 | |
- name: LPIPS | |
type: LPIPS | |
value: 0.315 | |
- name: PSNR | |
type: PSNR | |
value: 21 | |
- name: SSIM | |
type: SSIM | |
value: 0.55 | |
- name: CodebookUsage | |
type: CodebookUsage | |
value: 1.0 | |
This model is the VQGAN+ tokenizer with a vocabulary size of 12 bits. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256. | |
You can find more details on the [project page](https://weber-mark.github.io/projects/maskbit.html) and in the [paper](https://arxiv.org/abs/2409.16211). |