File size: 4,564 Bytes

3b0f5f3
 
01bbfb2
 
 
 
 
 
3b0f5f3
 
01bbfb2
3b0f5f3
3beee86
45510c6
01bbfb2
8ad3e60
3b0f5f3
eea2126
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
 
3b0f5f3
01bbfb2
 
3b0f5f3
01bbfb2
 
 
 
 
3b0f5f3
01bbfb2
 
 
 
 
 
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
 
 
3b0f5f3
 
01bbfb2
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
92c7b49
 
01bbfb2
 
 
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
eea2126
3b0f5f3
 
 
 
 
01bbfb2
3b0f5f3
01bbfb2
eea2126
3b0f5f3
 
 
01bbfb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b0f5f3
45510c6
 
5b2d3c5
45510c6
 
3b0f5f3
 
01bbfb2
8ecb22d
01bbfb2
 
3b0f5f3
2bb5327
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
 
01bbfb2

---
library_name: diffusers
license: apache-2.0
datasets:
- common-canvas/commoncatalog-cc-by
- alfredplpl/commoncatalog-cc-by-recap
language:
- en
---

# CommonArt-PoC

![tokyo](tokyo.png)

CommonArt is a text-to-image generation model with authorized images only.
The architecture is based on DiT that is used by Stable Diffusion 3 and Sora.

## How to Get Started with the Model

You can use this model by diffusers library.

```python
import torch
from diffusers import Transformer2DModel, PixArtSigmaPipeline

device = "cpu"
weight_dtype = torch.float32

transformer = Transformer2DModel.from_pretrained(
    "alfredplpl/CommonArt-PoC", 
    torch_dtype=weight_dtype,
    use_safetensors=True,
)

pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers",
    transformer=transformer,
    torch_dtype=weight_dtype,
    use_safetensors=True,
)

pipe.to(device)

prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty."
image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0]
image.save("beach.png")
```


## Model Details

### Model Description

- **Developed by:** alfredplpl
- **Funded by :** alfredplpl
- **Shared by :** alfredplpl
- **Model type:** Diffusion transformer
- **Language(s) (NLP):** English
- **License:** Apache-2.0

### Model Sources 

- **Repository:** [Pixart-Sigma](https://github.com/PixArt-alpha/PixArt-sigma)
- **Paper:** [PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://arxiv.org/abs/2403.04692)

## Uses

- Any purpose

### Direct Use

- To develop commercial text-to-image generation.
- To research non-commercial text-to-image generation.

### Out-of-Scope Use

- To generate misinformation.

## Bias, Risks, and Limitations

- limited represantation

## Training Details

### Training Data

I used these dataset to train the transformer.

- CommonCatalog CC BY
- CommonCatalog CC BY Extention

#### Training Hyperparameters

- **Training regime:**
```bash
_base_ = ['../PixArt_xl2_internal.py']
data_root = "/mnt/my_raid/pixart"
image_list_json = ['data_info.json']

data = dict(
    type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train',
    load_vae_feat=False, load_t5_feat=False,
)
image_size = 256

# model setting
model = 'PixArt_XL_2'
mixed_precision = 'fp16'  # ['fp16', 'fp32', 'bf16']
fp32_attention = True
#load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth"  # https://huggingface.co/PixArt-alpha/PixArt-Sigma
#resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae"  # sdxl vae
multi_scale = False  # if use multiscale dataset model training
pe_interpolation = 0.5

# training setting
num_workers = 10
train_batch_size = 64  # 64 as default
num_epochs = 200  # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.2
optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))
lr_schedule_args = dict(num_warmup_steps=1000)

#visualize=True
#train_sampling_steps = 3
#eval_sampling_steps = 3
log_interval = 20
save_model_epochs = 1
#save_model_steps = 2500
work_dir = 'output/debug'

# pixart-sigma
scale_factor = 0.13025
real_prompt_ratio = 0.5
model_max_length = 512
class_dropout_prob = 0.1

```

## How to resume training

1. Download the [model](checkpoint/epoch_50_step_116738.pth).
1. Set the model as "resume_from" model.

## Environmental Impact

- **Hardware Type:** A6000x2
- **Hours used:** 700
- **Compute Region:** Japan
- **Carbon Emitted:** Not so much

## Technical Specifications

### Model Architecture and Objective

Diffusion Transformer

### Compute Infrastructure

Desktop PC

#### Hardware

A6000x2

#### Software

[Pixart-Sigma repository](https://github.com/PixArt-alpha/PixArt-sigma)


## Model Card Contact

alfredplpl