recoilme commited on
Commit
3bdbe6e
·
verified ·
1 Parent(s): 7c569b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -1
README.md CHANGED
@@ -1,4 +1,99 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
4
- # work in progress
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-to-image
4
  ---
5
+ # Work and train in progress!
6
+
7
+ # ⚡️Waifu: Efficient High-Resolution Waifu Synthesis
8
+
9
+ ## Waifu is a free text-to-image model that can efficiently generate images in 80 languages. Our goal is to create a small model without compromising on quality.
10
+
11
+ ### Core designs include:
12
+
13
+ (1) [**AuraDiffusion/16ch-vae**](https://huggingface.co/AuraDiffusion/16ch-vae): A fully open source 16ch VAE. Natively trained in fp16. \
14
+ (2) [**Linear DiT**](https://github.com/NVlabs/Sana): we use 1.6b DiT transformer with linear attention. \
15
+ (3) [**MEXMA-SigLIP**](https://huggingface.co/visheratin/mexma-siglip): MEXMA-SigLIP is a model that combines the [MEXMA](https://huggingface.co/facebook/MEXMA) multilingual text encoder and an image encoder from the [SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384) model. This allows us to get a high-performance CLIP model for 80 languages.. \
16
+ (4) Other: we use Flow-Euler sampler, Adafactor-Fused optimizer and bf16 precision for training, and combine efficient caption labeling (MoonDream, CogVlM, Human, Gpts's) and danbooru tags to accelerate convergence.
17
+
18
+
19
+ ### Example
20
+
21
+ ```
22
+ import torch
23
+ from diffusers import DiffusionPipeline
24
+
25
+ from transformers import XLMRobertaTokenizerFast,XLMRobertaModel
26
+ from diffusers import FlowMatchEulerDiscreteScheduler
27
+ from diffusers.models import AutoencoderKL
28
+ from diffusers import SanaTransformer2DModel
29
+
30
+ pipe_id = "AiArtLab/waifu-2b"
31
+ variant = "fp16"
32
+ # tokenizer
33
+ tokenizer = XLMRobertaTokenizerFast.from_pretrained(
34
+ pipe_id,
35
+ subfolder="tokenizer"
36
+ )
37
+
38
+ # text_encoder
39
+ text_encoder = XLMRobertaModel.from_pretrained(
40
+ pipe_id,
41
+ variant=variant,
42
+ subfolder="text_encoder",
43
+ add_pooling_layer=False
44
+ ).to("cuda")
45
+
46
+ # scheduler
47
+ scheduler = FlowMatchEulerDiscreteScheduler(shift=1.0)
48
+
49
+ # VAE
50
+ vae = AutoencoderKL.from_pretrained(
51
+ pipe_id,
52
+ variant=variant,
53
+ subfolder="vae"
54
+ ).to("cuda")
55
+
56
+ # Transformer
57
+ transformer = SanaTransformer2DModel.from_pretrained(
58
+ pipe_id,
59
+ variant=variant,
60
+ subfolder="transformer"
61
+ ).to("cuda")
62
+
63
+ # Pipeline
64
+ pipeline = DiffusionPipeline.from_pretrained(
65
+ pipe_id,
66
+ tokenizer=tokenizer,
67
+ text_encoder=text_encoder,
68
+ vae=vae,
69
+ transformer=transformer,
70
+ trust_remote_code=True,
71
+ ).to("cuda")
72
+ print(pipeline)
73
+
74
+ prompt = 'аниме девушка, waifu, يبتسم جنسيا , sur le fond de la tour Eiffel'
75
+ generator = torch.Generator(device="cuda").manual_seed(42)
76
+
77
+ image = pipeline(
78
+ prompt = prompt,
79
+ negative_prompt = "",
80
+ generator=generator,
81
+ )[0]
82
+
83
+ for img in image:
84
+ img.show()
85
+ img.save('waifu.png')
86
+
87
+ ```
88
+
89
+ ![image](./waifu.png)
90
+
91
+ ## How to cite
92
+
93
+ ```bibtex
94
+ @misc{Waifu,
95
+ url = {[https://huggingface.co/AiArtLab/waifu-2b](https://huggingface.co/AiArtLab/waifu-2b)},
96
+ title = {waifu-2b},
97
+ author = {recoilme, muinez, femboysLover}
98
+ }
99
+ ```