Yuxuan Zhang
commited on
Commit
·
71306a5
1
Parent(s):
fb6f572
update
Browse files
README.md
CHANGED
@@ -1,10 +1,10 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
-
- zh
|
5 |
-
- en
|
6 |
base_model:
|
7 |
-
- THUDM/glm-4-9b
|
8 |
pipeline_tag: text-to-image
|
9 |
library_name: diffusers
|
10 |
---
|
@@ -25,18 +25,19 @@ library_name: diffusers
|
|
25 |
|
26 |
## Inference Requirements and Model Introduction
|
27 |
|
28 |
-
+ Resolution: Width and height must be between `512px` and `2048px`, divisible by `32`, and ensure the maximum number of
|
|
|
29 |
+ Precision: BF16 / FP32 (FP16 is not supported as it will cause overflow resulting in completely black images)
|
30 |
|
31 |
Using `BF16` precision with `batchsize=4` for testing, the memory usage is shown in the table below:
|
32 |
|
33 |
-
| Resolution
|
34 |
-
|
35 |
-
| 512 * 512
|
36 |
-
| 1280 * 720
|
37 |
-
| 1024 * 1024
|
38 |
-
| 1920 * 1280
|
39 |
-
| 2048 * 2048
|
40 |
|
41 |
## Quick Start
|
42 |
|
@@ -52,6 +53,7 @@ Then, run the following code:
|
|
52 |
|
53 |
```python
|
54 |
from diffusers import CogView4Pipeline
|
|
|
55 |
pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16)
|
56 |
|
57 |
# Open it for reduce GPU memory usage
|
@@ -72,52 +74,52 @@ image = pipe(
|
|
72 |
image.save("cogview4.png")
|
73 |
```
|
74 |
|
75 |
-
|
76 |
|
77 |
We've tested on multiple benchmarks and achieved the following scores:
|
78 |
|
79 |
-
|
80 |
-
|
81 |
-
|
|
82 |
-
|
83 |
-
|
|
84 |
-
|
|
85 |
-
|
|
86 |
-
|
|
87 |
-
|
|
88 |
-
|
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
|
94 |
-
|
95 |
-
|
|
96 |
-
|
|
97 |
-
|
|
98 |
-
|
|
99 |
-
|
|
100 |
-
|
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
|
|
108 |
-
|
|
109 |
-
|
|
110 |
-
|
|
111 |
-
|
|
112 |
-
|
|
|
113 |
|
114 |
## Chinese Text Accuracy Evaluation
|
115 |
|
116 |
-
| model
|
117 |
-
|
118 |
-
| kolors
|
119 |
-
| **
|
120 |
-
|
121 |
|
122 |
## Citation
|
123 |
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
+
- zh
|
5 |
+
- en
|
6 |
base_model:
|
7 |
+
- THUDM/glm-4-9b
|
8 |
pipeline_tag: text-to-image
|
9 |
library_name: diffusers
|
10 |
---
|
|
|
25 |
|
26 |
## Inference Requirements and Model Introduction
|
27 |
|
28 |
+
+ Resolution: Width and height must be between `512px` and `2048px`, divisible by `32`, and ensure the maximum number of
|
29 |
+
pixels does not exceed `2^21` px.
|
30 |
+ Precision: BF16 / FP32 (FP16 is not supported as it will cause overflow resulting in completely black images)
|
31 |
|
32 |
Using `BF16` precision with `batchsize=4` for testing, the memory usage is shown in the table below:
|
33 |
|
34 |
+
| Resolution | enable_model_cpu_offload OFF | enable_model_cpu_offload ON | enable_model_cpu_offload ON </br> Text Encoder 4bit |
|
35 |
+
|-------------|------------------------------|-----------------------------|-----------------------------------------------------|
|
36 |
+
| 512 * 512 | 33GB | 20GB | 13G |
|
37 |
+
| 1280 * 720 | 35GB | 20GB | 13G |
|
38 |
+
| 1024 * 1024 | 35GB | 20GB | 13G |
|
39 |
+
| 1920 * 1280 | 39GB | 20GB | 14G |
|
40 |
+
| 2048 * 2048 | 43GB | 21GB | 14G |
|
41 |
|
42 |
## Quick Start
|
43 |
|
|
|
53 |
|
54 |
```python
|
55 |
from diffusers import CogView4Pipeline
|
56 |
+
|
57 |
pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16)
|
58 |
|
59 |
# Open it for reduce GPU memory usage
|
|
|
74 |
image.save("cogview4.png")
|
75 |
```
|
76 |
|
77 |
+
### Model Metrics
|
78 |
|
79 |
We've tested on multiple benchmarks and achieved the following scores:
|
80 |
|
81 |
+
#### dpg_bench
|
82 |
+
|
83 |
+
| Model | Overall | Global | Entity | Attribute | Relation | Other |
|
84 |
+
|--------------|-----------|-----------|-----------|-----------|-----------|-----------|
|
85 |
+
| SDXL | 74.65 | 83.27 | 82.43 | 80.91 | 86.76 | 80.41 |
|
86 |
+
| PixArt-alpha | 71.11 | 74.97 | 79.32 | 78.60 | 82.57 | 76.96 |
|
87 |
+
| SD3-Medium | 84.08 | 87.90 | **91.01** | 88.83 | 80.70 | 88.68 |
|
88 |
+
| DALL-E 3 | 83.50 | **90.97** | 89.61 | 88.39 | 90.58 | 89.83 |
|
89 |
+
| Flux.1-dev | 83.79 | 85.80 | 86.79 | 89.98 | 90.04 | **89.90** |
|
90 |
+
| Janus-Pro-7B | 84.19 | 86.90 | 88.90 | 89.40 | 89.32 | 89.48 |
|
91 |
+
| **cogview4** | **85.13** | 83.85 | 90.35 | **91.17** | **91.14** | 87.29 |
|
92 |
+
|
93 |
+
#### Geneval
|
94 |
+
|
95 |
+
| Model | Overall | Single Obj. | Two Obj. | Counting | Colors | Position | Color attribution |
|
96 |
+
|-----------------|----------|-------------|----------|----------|----------|----------|-------------------|
|
97 |
+
| SDXL | 0.55 | 0.98 | 0.74 | 0.39 | 0.85 | 0.15 | 0.23 |
|
98 |
+
| PixArt-alpha | 0.48 | 0.98 | 0.50 | 0.44 | 0.80 | 0.08 | 0.07 |
|
99 |
+
| SD3-Medium | 0.74 | **0.99** | **0.94** | 0.72 | 0.89 | 0.33 | 0.60 |
|
100 |
+
| DALL-E 3 | 0.67 | 0.96 | 0.87 | 0.47 | 0.83 | 0.43 | 0.45 |
|
101 |
+
| Flux.1-dev | 0.66 | 0.98 | 0.79 | **0.73** | 0.77 | 0.22 | 0.45 |
|
102 |
+
| Janus-Pro-7B | **0.80** | **0.99** | 0.89 | 0.59 | **0.90** | **0.79** | **0.66** |
|
103 |
+
| **CogView4-6B** | 0.73 | **0.99** | 0.86 | 0.66 | 0.79 | 0.48 | 0.58 |
|
104 |
+
|
105 |
+
#### t2i_compbench
|
106 |
+
|
107 |
+
| Model | Color | Shape | Texture | 2D-Spatial | 3D-Spatial | Numeracy | Non-spatial Clip | Complex 3-in-1 |
|
108 |
+
|-----------------|------------|------------|------------|------------|------------|------------|------------------|----------------|
|
109 |
+
| SDXL | 0.5879 | 0.4687 | 0.5299 | 0.2133 | 0.3566 | 0.4988 | 0.3119 | 0.3237 |
|
110 |
+
| PixArt-alpha | 0.6690 | 0.4927 | 0.6477 | 0.2064 | 0.3901 | 0.5058 | **0.3197** | 0.3433 |
|
111 |
+
| SD3-Medium | **0.8132** | 0.5885 | **0.7334** | **0.3200** | **0.4084** | 0.6174 | 0.3140 | 0.3771 |
|
112 |
+
| DALL-E 3 | 0.7785 | **0.6205** | 0.7036 | 0.2865 | 0.3744 | 0.5880 | 0.3003 | 0.3773 |
|
113 |
+
| Flux.1-dev | 0.7572 | 0.5066 | 0.6300 | 0.2700 | 0.3992 | 0.6165 | 0.3065 | 0.3628 |
|
114 |
+
| Janus-Pro-7B | 0.5145 | 0.3323 | 0.4069 | 0.1566 | 0.2753 | 0.4406 | 0.3137 | 0.3806 |
|
115 |
+
| **CogView4-6B** | 0.7786 | 0.5880 | 0.6983 | 0.3075 | 0.3708 | **0.6626** | 0.3056 | **0.3869** |
|
116 |
|
117 |
## Chinese Text Accuracy Evaluation
|
118 |
|
119 |
+
| model | Precision | Recall | F1 Score | pick@4 |
|
120 |
+
|-----------------|------------|------------|------------|------------|
|
121 |
+
| kolors | 0.6094 | 0.1886 | 0.2880 | 0.1633 |
|
122 |
+
| **CogView4-6B** | **0.6969** | **0.5532** | **0.6168** | **0.3265** |
|
|
|
123 |
|
124 |
## Citation
|
125 |
|