tags:
- unconditional-image-generation
- pytorch
license: mit
min(DALL路E)
This is a fast, minimal port of DALL路E Mega. It has been stripped down for inference and converted to PyTorch. The only third party dependencies are numpy, requests, pillow and torch.
To generate a 4x4 grid of DALL路E Mega images it takes:
- 89 sec with a T4 in Colab
- 48 sec with a P100 in Colab
- 13 sec with an A100 on Replicate
Install
$ pip install min-dalle
Usage
Load the model parameters once and reuse the model to generate multiple images.
from min_dalle import MinDalle
model = MinDalle(
models_root='./pretrained',
dtype=torch.float32,
is_mega=True,
is_reusable=True
)
The required models will be downloaded to models_root
if they are not already there. Set the dtype
to torch.float16
to save GPU memory. If you have an Ampere architecture GPU you can use torch.bfloat16
. Once everything has finished initializing, call generate_image
with some text as many times as you want. Use a positive seed
for reproducible results. Higher values for log2_supercondition_factor
result in better agreement with the text but a narrower variety of generated images. Every image token is sampled from the top-$k$ most probable tokens.
image = model.generate_image(
text='Nuclear explosion broccoli',
seed=-1,
grid_size=4,
log2_k=6,
log2_supercondition_factor=5,
is_verbose=False
)
display(image)
credit: https://twitter.com/hardmaru/status/1544354119527596034
Saving Individual Images
The images can also be generated as a FloatTensor
in case you want to process them manually.
images = model.generate_images(
text='Nuclear explosion broccoli',
seed=-1,
image_count=7,
log2_k=6,
log2_supercondition_factor=5,
is_verbose=False
)
To get an image into PIL format you will have to first move the images to the CPU and convert the tensor to a numpy array.
images = images.to('cpu').numpy()
Then image $i$ can be coverted to a PIL.Image and saved
image = Image.fromarray(images[i])
image.save('image_{}.png'.format(i))
Interactive
If the model is being used interactively (e.g. in a notebook) generate_image_stream
can be used to generate a stream of images as the model is decoding. The detokenizer adds a slight delay for each image. Setting log2_mid_count
to 3 results in a total of 2 ** 3 = 8
generated images. The only valid values for log2_mid_count
are 0, 1, 2, 3, and 4. This is implemented in the colab.
image_stream = model.generate_image_stream(
text='Dali painting of WALL路E',
seed=-1,
grid_size=3,
log2_mid_count=3,
log2_k=6,
log2_supercondition_factor=3,
is_verbose=False
)
for image in image_stream:
display(image)
Command Line
Use image_from_text.py
to generate images from the command line.
$ python image_from_text.py --text='artificial intelligence' --no-mega