NikshepShetty
/

Florence-2-pixelprose

image-captioning

Model card Files Files and versions Community

NikshepShetty commited on Aug 3, 2024

Commit

7c5d9f0

·

verified ·

1 Parent(s): 37bf3ba

Update README.md

Files changed (1) hide show

README.md +38 -1

README.md CHANGED Viewed

@@ -10,6 +10,27 @@ tags:
   - adapter
   - image-captioning
   - peft
 ---
 # Florence-2 PixelProse LoRA Adapter
@@ -57,4 +78,20 @@ This code demonstrates how to:
 2. Load the LoRA adapter
 3. Process an image and generate a detailed caption
-Note: Make sure you have the required libraries installed: transformers, peft, einops, flash_attn, timm, Pillow, and requests.

   - adapter
   - image-captioning
   - peft
+model-index:
+- name: Florence-2-DOCCI-FT
+  results:
+  - task:
+      type: image-to-text
+      name: Image Captioning
+    dataset:
+      name: foundation-multimodal-models/DetailCaps-4870
+      type: other
+    metrics:
+    - type: meteor
+      value: 0.250
+    - type: bleu
+      value: 0.155
+    - type: cider
+      value: 0.039
+    - type: capture
+      value: 0.555
+    - type: rouge-l
+      value: 0.298
 ---
 # Florence-2 PixelProse LoRA Adapter
 2. Load the LoRA adapter
 3. Process an image and generate a detailed caption
+Note: Make sure you have the required libraries installed: transformers, peft, einops, flash_attn, timm, Pillow, and requests.
+## Evaluation results
+Our LoRA adapter shows improvements over the base Florence-2 model across all metrics for MORE_DETAILED_CAPTION tag for 1000 images on the foundation-multimodal-models/DetailCaps-4870 dataset:
+| Metric  | Base Model | PixelProse Adapter | Improvement |
+|---------|------------|---------------------|-------------|
+| METEOR  | 0.213      | 0.250               | +17.4%      |
+| BLEU    | 0.110      | 0.155               | +40.9%      |
+| CIDEr   | 0.031      | 0.039               | +25.8%      |
+| CAPTURE | 0.546      | 0.555               | +1.6%       |
+| ROUGE-L | 0.275      | 0.298               | +8.4%       |
+These results demonstrate that our LoRA adapter enhances the image captioning capabilities of the Florence-2 base model, particularly in generating more detailed and accurate captions.