Update README.md
Browse files
README.md
CHANGED
@@ -2,9 +2,9 @@
|
|
2 |
license: apache-2.0
|
3 |
pipeline_tag: text-to-image
|
4 |
---
|
5 |
-
# Work
|
6 |
|
7 |
-
⚡️Waifu:
|
8 |
|
9 |
|
10 |
waifu is a free text-to-image model that can efficiently generate images in 80 languages. Our goal is to create a small model without compromising on quality.
|
@@ -14,7 +14,7 @@ waifu is a free text-to-image model that can efficiently generate images in 80 l
|
|
14 |
(1) [**AuraDiffusion/16ch-vae**](https://huggingface.co/AuraDiffusion/16ch-vae): A fully open source 16ch VAE. Natively trained in fp16. \
|
15 |
(2) [**Linear DiT**](https://github.com/NVlabs/Sana): we use 1.6b DiT transformer with linear attention. \
|
16 |
(3) [**MEXMA-SigLIP**](https://huggingface.co/visheratin/mexma-siglip): MEXMA-SigLIP is a model that combines the [MEXMA](https://huggingface.co/facebook/MEXMA) multilingual text encoder and an image encoder from the [SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384) model. This allows us to get a high-performance CLIP model for 80 languages.. \
|
17 |
-
(4) Other: we use Flow-Euler sampler, Adafactor-Fused optimizer and bf16 precision for training, and combine efficient caption labeling (MoonDream, CogVlM, Human,
|
18 |
|
19 |
|
20 |
## Example
|
|
|
2 |
license: apache-2.0
|
3 |
pipeline_tag: text-to-image
|
4 |
---
|
5 |
+
# Work / train in progress
|
6 |
|
7 |
+
⚡️Waifu: efficient high-resolution waifu synthesis
|
8 |
|
9 |
|
10 |
waifu is a free text-to-image model that can efficiently generate images in 80 languages. Our goal is to create a small model without compromising on quality.
|
|
|
14 |
(1) [**AuraDiffusion/16ch-vae**](https://huggingface.co/AuraDiffusion/16ch-vae): A fully open source 16ch VAE. Natively trained in fp16. \
|
15 |
(2) [**Linear DiT**](https://github.com/NVlabs/Sana): we use 1.6b DiT transformer with linear attention. \
|
16 |
(3) [**MEXMA-SigLIP**](https://huggingface.co/visheratin/mexma-siglip): MEXMA-SigLIP is a model that combines the [MEXMA](https://huggingface.co/facebook/MEXMA) multilingual text encoder and an image encoder from the [SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384) model. This allows us to get a high-performance CLIP model for 80 languages.. \
|
17 |
+
(4) Other: we use Flow-Euler sampler, Adafactor-Fused optimizer and bf16 precision for training, and combine efficient caption labeling (MoonDream, CogVlM, Human, Gpt's) and danbooru tags to accelerate convergence.
|
18 |
|
19 |
|
20 |
## Example
|