sayakpaul HF Staff commited on
Commit
d9b7396
·
verified ·
1 Parent(s): 82a4137

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -1
README.md CHANGED
@@ -1 +1,76 @@
1
- WandB: https://wandb.ai/sayakpaul/shot-categorizer/runs/2ean2v4u
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - microsoft/Florence-2-large
7
+ ---
8
+ # Shot Categorizer 🎬
9
+
10
+ <div align="center">
11
+ <img src="assets/header.jpg"/>
12
+ </div>
13
+
14
+ Shot categorization model finetuned from the [`microsoft/Florence-2-large`](https://huggingface.co/microsoft/Florence-2-large) model. This
15
+ model can be used to obtain metadata information about shots which can further be used to curate datasets of different kinds.
16
+
17
+ Training configuration:
18
+
19
+ * Batch size: 16
20
+ * Gradient accumulation steps: 4
21
+ * Learning rate: 1e-6
22
+ * Epochs: 20
23
+ * Max grad norm: 1.0
24
+ * Hardware: 8xH100s
25
+
26
+ Training was conducted using FP16 mixed-precision and DeepSpeed Zero2 scheme. The vision tower of the model
27
+ was kept frozen during the training.
28
+
29
+ ## Inference
30
+
31
+ ```py
32
+ from transformers import AutoModelForCausalLM, AutoProcessor
33
+ import torch
34
+ from PIL import Image
35
+ import requests
36
+
37
+
38
+ folder_path = "diffusers-internal-dev/shot-categorizer-v0"
39
+ model = (
40
+ AutoModelForCausalLM.from_pretrained(folder_path, torch_dtype=torch.float16, trust_remote_code=True)
41
+ .to("cuda")
42
+ .eval()
43
+ )
44
+ processor = AutoProcessor.from_pretrained(folder_path, trust_remote_code=True)
45
+
46
+ prompts = ["<COLOR>", "<LIGHTING>", "<LIGHTING_TYPE>", "<COMPOSITION>"]
47
+ url = "diffusers-internal-dev/shot-categorizer-v0/resolve/main/assets/image_3.jpg"
48
+ image = Image.open(img_path).convert("RGB")
49
+
50
+ with torch.no_grad() and torch.inference_mode():
51
+ for prompt in prompts:
52
+ inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda", torch.float16)
53
+ generated_ids = model.generate(
54
+ input_ids=inputs["input_ids"],
55
+ pixel_values=inputs["pixel_values"],
56
+ max_new_tokens=1024,
57
+ early_stopping=False,
58
+ do_sample=False,
59
+ num_beams=3,
60
+ )
61
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
62
+ parsed_answer = processor.post_process_generation(
63
+ generated_text, task=prompt, image_size=(image.width, image.height)
64
+ )
65
+ print(parsed_answer)
66
+
67
+ ```
68
+
69
+ Should print:
70
+
71
+ ```bash
72
+ {'<COLOR>': 'Cool, Saturated, Cyan, Blue'}
73
+ {'<LIGHTING>': 'Soft light, Low contrast'}
74
+ {'<LIGHTING_TYPE>': 'Daylight, Sunny'}
75
+ {'<COMPOSITION>': 'Left heavy'}
76
+ ```