diffusers
/

shot-categorizer-v0

Model card Files Files and versions

sayakpaul HF Staff commited on Feb 27

Commit

d9b7396

·

verified ·

1 Parent(s): 82a4137

Update README.md

Files changed (1) hide show

README.md +76 -1

README.md CHANGED Viewed

	@@ -1 +1,76 @@
1	- ~~WandB: https://wandb.ai/sayakpaul/shot-categorizer/runs/2ean2v4u~~

+---
+license: mit
+language:
+- en
+base_model:
+- microsoft/Florence-2-large
+---
+# Shot Categorizer 🎬
+<div align="center">
+  <img src="assets/header.jpg"/>
+</div>
+Shot categorization model finetuned from the [`microsoft/Florence-2-large`](https://huggingface.co/microsoft/Florence-2-large) model. This
+model can be used to obtain metadata information about shots which can further be used to curate datasets of different kinds.
+Training configuration:
+* Batch size: 16
+* Gradient accumulation steps: 4
+* Learning rate: 1e-6
+* Epochs: 20
+* Max grad norm: 1.0
+* Hardware: 8xH100s
+Training was conducted using FP16 mixed-precision and DeepSpeed Zero2 scheme. The vision tower of the model
+was kept frozen during the training.
+## Inference
+```py
+from transformers import AutoModelForCausalLM, AutoProcessor
+import torch
+from PIL import Image
+import requests
+folder_path = "diffusers-internal-dev/shot-categorizer-v0"
+model = (
+    AutoModelForCausalLM.from_pretrained(folder_path, torch_dtype=torch.float16, trust_remote_code=True)
+    .to("cuda")
+    .eval()
+)
+processor = AutoProcessor.from_pretrained(folder_path, trust_remote_code=True)
+prompts = ["<COLOR>", "<LIGHTING>", "<LIGHTING_TYPE>", "<COMPOSITION>"]
+url = "diffusers-internal-dev/shot-categorizer-v0/resolve/main/assets/image_3.jpg"
+image = Image.open(img_path).convert("RGB")
+with torch.no_grad() and torch.inference_mode():
+    for prompt in prompts:
+        inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda", torch.float16)
+        generated_ids = model.generate(
+            input_ids=inputs["input_ids"],
+            pixel_values=inputs["pixel_values"],
+            max_new_tokens=1024,
+            early_stopping=False,
+            do_sample=False,
+            num_beams=3,
+        )
+        generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
+        parsed_answer = processor.post_process_generation(
+            generated_text, task=prompt, image_size=(image.width, image.height)
+        )
+        print(parsed_answer)
+```
+Should print:
+```bash
+{'<COLOR>': 'Cool, Saturated, Cyan, Blue'}
+{'<LIGHTING>': 'Soft light, Low contrast'}
+{'<LIGHTING_TYPE>': 'Daylight, Sunny'}
+{'<COMPOSITION>': 'Left heavy'}
+```