SSD Latent Preview at Half-Size

The decoder provides a preview image.

Max supported resolution is between 768 and 1024px.

The model processes a latent representation that has been reshaped from 4 channels to 64 channels, which differs from the original implementation.

Inference

from diffusers import AutoencoderKL, StableDiffusionXLPipeline
from safetensors.torch import load_model
from tea_64_model import TeaDecoder
import torch
from torchvision import transforms

def preview_image(latents, pipe):
    latents = latents / pipe.vae.config.scaling_factor
    tea = TeaDecoder(ch_in=4)
    load_model(tea, './vae_decoder.safetensors')
    tea.to(device='cuda')
    output = tea(latents.float()) / 2.0 + 0.5
    preview = transforms.ToPILImage()(output[0].clamp(0, 1))

    return preview

if __name__ == '__main__':
    pipe = StableDiffusionXLPipeline.from_pretrained('segmind/SSD-1B',
                                                     torch_dtype=torch.float16,
                                                     use_safetensors=True,
                                                     variant='fp16')
    latents = pipe('cat playing piano',
                   'bad quality, low quality',
                   num_inference_steps=20,
                   output_type='latent').images
    preview = preview_image(latents, pipe)
    preview.save('cat.png')

Datasets

  • Describable Textures Dataset (DTD)
  • AngelBottomless/Booru-Parquets
  • hastylol/nai3
  • jordandavis/fashion_num_people
  • mattmdjaga/human_parsing_dataset
  • recoilme/portraits_xs
  • skytnt/anime-segmentation
  • twodgirl/classicism
  • twodgirl/vndb
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .