File size: 501 Bytes
ba88ddd adba43e f573f2d e56697f bc54ee8 4ee690a bc54ee8 784bfc9 e56697f |
1 2 3 4 5 6 7 8 9 10 11 |
---
license: apache-2.0
---
This model can encode 224x224 RGB image into 28x28x13bit (1274 bytes) latent. The compression rate is 28x28x13/(224x224x24)=1/118, or 0.203 bpp (same as VQGAN_f8_8192).
12M params for Encoder + Decoder. Trained on LAION-Aesthetics V2 5+ for 130M images.
Guided by https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K (it's great. better than OpenAI CLIP B/32) and https://github.com/dingkeyan93/DISTS. No GAN loss.
(still training. final checkpt will be better) |