ashok2216
/

vit-gpt2-image-captioning_COCO_FineTuned

vision-encoder-decoder

image-captioning

Model card Files Files and versions Community

ashok2216 commited on Nov 12, 2024

Commit

f94e3a3

·

verified ·

1 Parent(s): 06a5a69

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -13,6 +13,12 @@ tags:
 - image-captioning
 ---
 # vit-gpt2-image-captioning_COCO_FineTuned
 This repository contains the fine-tuned ViT-GPT2 model for image captioning, trained on the COCO dataset. The model combines a Vision Transformer (ViT) for image feature extraction and GPT-2 for text generation to create descriptive captions from images.

 - image-captioning
 ---
+widget:
+  - text: "picture of a futuristic tiger, artstation"
+    output:
+      url:
 # vit-gpt2-image-captioning_COCO_FineTuned
 This repository contains the fine-tuned ViT-GPT2 model for image captioning, trained on the COCO dataset. The model combines a Vision Transformer (ViT) for image feature extraction and GPT-2 for text generation to create descriptive captions from images.