Florence-2-base-PromptGen-v1.5 / README.md

Update with link to https://github.com/spgoodman/florence2-visionapi

7b8a29f verified 10 months ago

4.05 kB

	---
	license: mit
	base_model:
	- MiaoshouAI/Florence-2-base-PromptGen-v1.5
	- microsoft/Florence-2-base
	---

	# Florence-2-base-PromptGen v1.5 (with config and code updates)

	This is [MiaoshouAI/Florence-2-base-PromptGen-v1.5](https://huggingface.co/MiaoshouAI/Florence-2-base-PromptGen-v1.5) which retains its existing features, but with changes to supporting configuration and code to ensure drop-in replacement for [Microsoft Florence-2 Model Base](https://huggingface.co/microsoft/Florence-2-base) the when using Transformers library in Python.

	1) The config.json has been updated with an auto_map property and key-values added matching the Florence-2-base [resolving this issue](https://huggingface.co/MiaoshouAI/Florence-2-base-PromptGen-v1.5/discussions/4).
	2) Python code that is located in the root of the repo in Florence-2-base but in florence2_base_ft in [MiaoshouAI/Florence-2-base-PromptGen-v1.5](https://huggingface.co/MiaoshouAI/Florence-2-base-PromptGen-v1.5) has been moved to the root of the repo as this prevented trust_remove_code=True in the AutoProcessor.from_pretrained from loading the code.
	3) Changes to Florence2-base's modeling_florence2.py to ensure that the class Florence2LanguageForConditionalGeneration inherits from GenerationMixin, secondary to PreTrainedModel to ensure compatibility with transformers from v4.50 onwards .

	## About PromptGen
	Florence-2-base-PromptGen is a model trained by [MiaoshouAI](https://huggingface.co/MiaoshouAI) that specializes in generating highly descriptive prompts and tags that assist with training image generation models like [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) and creating descriptive prompts for image generation.

	Supported prompts include standard prompts from Florence2-base such as <0D> for identifying object locations and enhanced prompts by MiaoshouAI including <CAPTION>, <DETAILED_CAPTION>, <MORE_DETAILED_CAPTION> and additional prompts included <GENERATE_TAGS> and <MIXED_CAPTION>. See the original repo for more details.

	## How to use:

	To use this model, you can load it directly from the Hugging Face Model Hub.

	To run it as an API Server, either on Windows or Linux, with command line clients (including fast captioning of all images in folders) you can use [Florence2 Vision API Server
	](https://github.com/spgoodman/florence2-visionapi).

	First, install dependancies (in a virtual environent if you prefer), for example:
	```
	pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu124
	pip3 install transformers pillow einops timm
	```

	The following code is based on the microsoft/Florence2-base example but with updated prompt and model, and correct imports.

	```python
	import requests
	import torch
	from PIL import Image
	from transformers import AutoProcessor, AutoModelForCausalLM


	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

	model = AutoModelForCausalLM.from_pretrained("createveai/Florence-2-base-PromptGen-v1.5", torch_dtype=torch_dtype, trust_remote_code=True).to(device)
	processor = AutoProcessor.from_pretrained("createveai/Florence-2-base-PromptGen-v1.5", trust_remote_code=True)

	# Examples include CAPTION>, <DETAILED_CAPTION>, <MORE_DETAILED_CAPTION>,<GENERATE_TAGS>, <MIXED_CAPTION>, <0D>
	prompt = "<CAPTION>"

	url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
	image = Image.open(requests.get(url, stream=True).raw)

	inputs = processor(text=prompt, images=image, return_tensors="pt").to(device, torch_dtype)

	generated_ids = model.generate(
	input_ids=inputs["input_ids"],
	pixel_values=inputs["pixel_values"],
	max_new_tokens=1024,
	num_beams=3,
	do_sample=False
	)
	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
	parsed_answer = processor.post_process_generation(generated_text, task=prompt, image_size=(image.width, image.height))

	print(parsed_answer[prompt])
	```