dblasko
/

blip-dalle3-img2prompt

image-text-to-text

image-captioning

Inference Endpoints

Model card Files Files and versions Community

blip-dalle3-img2prompt / README.md

dblasko's picture

Update README.md

fd2c38b about 1 year ago

|

1.61 kB

	---
	datasets:
	- laion/dalle-3-dataset
	language:
	- en
	tags:
	- art
	- image-to-text
	- image-captioning
	---

	# DALL·E 3 Image prompt reverse-engineering

	Pre-trained image-captioning model BLIP fine-tuned on a mixture of `laion/dalle-3-dataset` and semi-automatically gathered `(image, prompt)` data from DALLE·E 3.
	It takes a generated image as an input and outputs a potential prompt to generate such an image, which can then be used as a base to generate similar images.

	⚠️ Disclaimer: This model is not intended for commercial use as the data it was trained on includes images generated by DALLE·E 3. This is for educational purposes only.

	### Usage:

	Loading the model and preprocessor:
	```python
	from transformers import BlipForConditionalGeneration, AutoProcessor

	model = BlipForConditionalGeneration.from_pretrained("dblasko/blip-dalle3-img2prompt").to(device)
	processor = AutoProcessor.from_pretrained("dblasko/blip-dalle3-img2prompt")
	```

	Inference example on an image from `laion/dalle-3-dataset`:
	```python
	from datasets import load_dataset

	dataset = load_dataset("laion/dalle-3-dataset", split=f'train[0%:1%]') # for fast download time in the toy example
	example = dataset[img_index][0]
	image = example["image"]
	caption = example["caption"]

	inputs = processor(images=image, return_tensors="pt").to(device)
	pixel_values = inputs.pixel_values

	generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
	generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

	print(f"Generated caption: {generated_caption}\nReal caption: {caption}")
	```