dblasko
/

blip-dalle3-img2prompt

image-text-to-text

image-captioning

Inference Endpoints

Model card Files Files and versions Community

dblasko commited on Oct 13, 2023

Commit

a9d5518

·

1 Parent(s): ef19e3f

Update README.md

Files changed (1) hide show

README.md +31 -1

README.md CHANGED Viewed

@@ -11,4 +11,34 @@ tags:
 # DALL·E 3 Image prompt reverse-engineering
-Pre-trained image-captioning model BLIP fine-tuned on a mixture of `laion/dalle-3-dataset` and semi-automatically gathered `(image, prompt)` data from DALLE·E 3. It takes a generated image as an input and outputs a potential prompt to generate such an image, which can then be used as a base to generate similar images.

 # DALL·E 3 Image prompt reverse-engineering
+Pre-trained image-captioning model BLIP fine-tuned on a mixture of `laion/dalle-3-dataset` and semi-automatically gathered `(image, prompt)` data from DALLE·E 3.
+It takes a generated image as an input and outputs a potential prompt to generate such an image, which can then be used as a base to generate similar images.
+### Usage:
+Loading the model and preprocessor:
+```python
+from transformers import BlipForConditionalGeneration, AutoProcessor
+model = BlipForConditionalGeneration.from_pretrained("blip-dalle3-img2prompt").to(device)
+processor = AutoProcessor.from_pretrained("blip-dalle3-img2prompt")
+```
+Inference example on an image from `laion/dalle-3-dataset`:
+```python
+from datasets import load_dataset
+dataset = load_dataset("laion/dalle-3-dataset", split=f'train[0%:1%]') # for fast download time in the toy example
+example = dataset[img_index][0]
+image = example["image"]
+caption = example["caption"]
+inputs = processor(images=image, return_tensors="pt").to(device)
+pixel_values = inputs.pixel_values
+generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
+generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(f"Generated caption: {generated_caption}\nReal caption: {caption}")
+```