ashok2216
/

vit-gpt2-image-captioning_COCO_FineTuned

vision-encoder-decoder

image-captioning

Model card Files Files and versions Community

ashok2216 commited on Nov 19

Commit

6b49fcc

•

1 Parent(s): f3ab247

Update README.md

Files changed (1) hide show

README.md +8 -10

README.md CHANGED Viewed

@@ -1,15 +1,12 @@
 ---
 license: apache-2.0
 widget:
-- src: >-
-    https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
-  example_title: Savanna
-- src: >-
-    https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
-  example_title: Football Match
-- src: >-
-    https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
-  example_title: Airport
 language:
 - en
 metrics:
@@ -21,9 +18,10 @@ tags:
 - image_to_text
 - COCO
 - image-captioning
 pipeline_tag: image-to-text
 ---
 # vit-gpt2-image-captioning_COCO_FineTuned
 This repository contains the fine-tuned ViT-GPT2 model for image captioning, trained on the COCO dataset. The model combines a Vision Transformer (ViT) for image feature extraction and GPT-2 for text generation to create descriptive captions from images.

 ---
 license: apache-2.0
 widget:
+  - type: image-to-text
+    example:
+      image_url: "tiger.jpg"
+      prompt: "Describe this image in one sentence."
 language:
 - en
 metrics:
 - image_to_text
 - COCO
 - image-captioning
 pipeline_tag: image-to-text
 ---
 # vit-gpt2-image-captioning_COCO_FineTuned
 This repository contains the fine-tuned ViT-GPT2 model for image captioning, trained on the COCO dataset. The model combines a Vision Transformer (ViT) for image feature extraction and GPT-2 for text generation to create descriptive captions from images.