michelecafagna26 commited on
Commit
9307a3f
·
1 Parent(s): 1b02cea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md CHANGED
@@ -1,3 +1,66 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - image-captioning
5
+ languages:
6
+ - en
7
+ pipeline_tag: image-to-text
8
+ datasets:
9
+ - michelecafagna26/hl
10
+ language:
11
+ - en
12
+ metrics:
13
+ - sacrebleu
14
+ - rouge
15
+ library_name: transformers
16
  ---
17
+ ## BLIP-base fine-tuned for Image Capioning on High-Level descriptions of Actons
18
+
19
+ [BLIP](https://arxiv.org/abs/2201.12086) base trained on the [HL dataset](https://huggingface.co/datasets/michelecafagna26/hl) for **high-level descriptions of actions**
20
+
21
+ ## Model fine-tuning 🏋️‍
22
+
23
+ Trained for a maximum of 6 epochs using a
24
+ lr: 5e−5,
25
+ Adam optimizer,
26
+ half-precision (fp16)
27
+
28
+ ## Test set metrics 🧾
29
+
30
+ | Cider | SacreBLEU | Rouge-L|
31
+ |--------|------------|--------|
32
+ | 123.07 | 17.16 | 32.16 |
33
+
34
+ ## Model in Action 🚀
35
+
36
+ ```python
37
+ import requests
38
+ from PIL import Image
39
+ from transformers import BlipProcessor, BlipForConditionalGeneration
40
+
41
+ processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
42
+ model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to("cuda")
43
+
44
+ img_url = 'https://datasets-server.huggingface.co/assets/michelecafagna26/hl/--/default/train/0/image/image.jpg'
45
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
46
+
47
+
48
+ inputs = processor(raw_image, return_tensors="pt").to("cuda")
49
+ pixel_values = inputs.pixel_values
50
+
51
+ generated_ids = model.generate(pixel_values=pixel_values, max_length=50,
52
+ do_sample=True,
53
+ top_k=120,
54
+ top_p=0.9,
55
+ early_stopping=True,
56
+ num_return_sequences=1)
57
+
58
+ processor.batch_decode(generated_ids, skip_special_tokens=True)
59
+
60
+ >>> she's holding a parasol
61
+ ```
62
+
63
+ ## BibTex and citation info
64
+
65
+ ```BibTeX
66
+ ```