ashok2216
/

vit-gpt2-image-captioning_COCO_FineTuned

vision-encoder-decoder

image-captioning

Model card Files Files and versions Community

ashok2216 commited on Nov 12, 2024

Commit

e1990fe

·

verified ·

1 Parent(s): a59c44b

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -38,7 +38,7 @@ You can use this model for image captioning tasks with the Hugging Face transfor
 # Installation
 To use this model, you need to install the following libraries:
 bash
 Copy code
 pip install torch torchvision transformers
@@ -48,20 +48,20 @@ Copy code
 from transformers import VisionEncoderDecoderModel, ViTImageProcessor, GPT2Tokenizer
 import torch
 from PIL import Image
 # Load the fine-tuned model and tokenizer
 model = VisionEncoderDecoderModel.from_pretrained("ashok2216/vit-gpt2-image-captioning_COCO_FineTuned")
 processor = ViTImageProcessor.from_pretrained("ashok2216/vit-gpt2-image-captioning_COCO_FineTuned")
 tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
 # Preprocess the image
 image = Image.open("path_to_image.jpg")
 inputs = processor(images=image, return_tensors="pt")
 # Generate caption
 pixel_values = inputs.pixel_values
 output = model.generate(pixel_values)
 caption = tokenizer.decode(output[0], skip_special_tokens=True)
@@ -72,7 +72,7 @@ Image Input: The input should be an image file. Supported formats include .jpg,
 Output: A text string representing the generated caption for the image.
 Example
 For an input image, the model might generate a caption like:
 # Input Image:
 Generated Caption:

 # Installation
 To use this model, you need to install the following libraries:
+```python
 bash
 Copy code
 pip install torch torchvision transformers
 from transformers import VisionEncoderDecoderModel, ViTImageProcessor, GPT2Tokenizer
 import torch
 from PIL import Image
+```
 # Load the fine-tuned model and tokenizer
+```python
 model = VisionEncoderDecoderModel.from_pretrained("ashok2216/vit-gpt2-image-captioning_COCO_FineTuned")
 processor = ViTImageProcessor.from_pretrained("ashok2216/vit-gpt2-image-captioning_COCO_FineTuned")
 tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+```
 # Preprocess the image
+```python
 image = Image.open("path_to_image.jpg")
 inputs = processor(images=image, return_tensors="pt")
+```
 # Generate caption
+```python
 pixel_values = inputs.pixel_values
 output = model.generate(pixel_values)
 caption = tokenizer.decode(output[0], skip_special_tokens=True)
 Output: A text string representing the generated caption for the image.
 Example
 For an input image, the model might generate a caption like:
+```
 # Input Image:
 Generated Caption: