File size: 4,564 Bytes
3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 3beee86 45510c6 01bbfb2 8ad3e60 3b0f5f3 eea2126 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 92c7b49 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 eea2126 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 eea2126 3b0f5f3 01bbfb2 3b0f5f3 45510c6 5b2d3c5 45510c6 3b0f5f3 01bbfb2 8ecb22d 01bbfb2 3b0f5f3 2bb5327 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 3b0f5f3 01bbfb2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
---
library_name: diffusers
license: apache-2.0
datasets:
- common-canvas/commoncatalog-cc-by
- alfredplpl/commoncatalog-cc-by-recap
language:
- en
---
# CommonArt-PoC
![tokyo](tokyo.png)
CommonArt is a text-to-image generation model with authorized images only.
The architecture is based on DiT that is used by Stable Diffusion 3 and Sora.
## How to Get Started with the Model
You can use this model by diffusers library.
```python
import torch
from diffusers import Transformer2DModel, PixArtSigmaPipeline
device = "cpu"
weight_dtype = torch.float32
transformer = Transformer2DModel.from_pretrained(
"alfredplpl/CommonArt-PoC",
torch_dtype=weight_dtype,
use_safetensors=True,
)
pipe = PixArtSigmaPipeline.from_pretrained(
"PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers",
transformer=transformer,
torch_dtype=weight_dtype,
use_safetensors=True,
)
pipe.to(device)
prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty."
image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0]
image.save("beach.png")
```
## Model Details
### Model Description
- **Developed by:** alfredplpl
- **Funded by :** alfredplpl
- **Shared by :** alfredplpl
- **Model type:** Diffusion transformer
- **Language(s) (NLP):** English
- **License:** Apache-2.0
### Model Sources
- **Repository:** [Pixart-Sigma](https://github.com/PixArt-alpha/PixArt-sigma)
- **Paper:** [PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://arxiv.org/abs/2403.04692)
## Uses
- Any purpose
### Direct Use
- To develop commercial text-to-image generation.
- To research non-commercial text-to-image generation.
### Out-of-Scope Use
- To generate misinformation.
## Bias, Risks, and Limitations
- limited represantation
## Training Details
### Training Data
I used these dataset to train the transformer.
- CommonCatalog CC BY
- CommonCatalog CC BY Extention
#### Training Hyperparameters
- **Training regime:**
```bash
_base_ = ['../PixArt_xl2_internal.py']
data_root = "/mnt/my_raid/pixart"
image_list_json = ['data_info.json']
data = dict(
type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train',
load_vae_feat=False, load_t5_feat=False,
)
image_size = 256
# model setting
model = 'PixArt_XL_2'
mixed_precision = 'fp16' # ['fp16', 'fp32', 'bf16']
fp32_attention = True
#load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth" # https://huggingface.co/PixArt-alpha/PixArt-Sigma
#resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae" # sdxl vae
multi_scale = False # if use multiscale dataset model training
pe_interpolation = 0.5
# training setting
num_workers = 10
train_batch_size = 64 # 64 as default
num_epochs = 200 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.2
optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))
lr_schedule_args = dict(num_warmup_steps=1000)
#visualize=True
#train_sampling_steps = 3
#eval_sampling_steps = 3
log_interval = 20
save_model_epochs = 1
#save_model_steps = 2500
work_dir = 'output/debug'
# pixart-sigma
scale_factor = 0.13025
real_prompt_ratio = 0.5
model_max_length = 512
class_dropout_prob = 0.1
```
## How to resume training
1. Download the [model](checkpoint/epoch_50_step_116738.pth).
1. Set the model as "resume_from" model.
## Environmental Impact
- **Hardware Type:** A6000x2
- **Hours used:** 700
- **Compute Region:** Japan
- **Carbon Emitted:** Not so much
## Technical Specifications
### Model Architecture and Objective
Diffusion Transformer
### Compute Infrastructure
Desktop PC
#### Hardware
A6000x2
#### Software
[Pixart-Sigma repository](https://github.com/PixArt-alpha/PixArt-sigma)
## Model Card Contact
alfredplpl |