hezarai
/

vit-roberta-fa-image-captioning-flickr30k

Model card Files Files and versions Community

arxyzan commited on Oct 17, 2023

Commit

46d20fd

•

1 Parent(s): f5fa9c5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ metrics:
 pipeline_tag: image-to-text
 ---
-A Persian image captioning model constructed from a ViT + RoBERTa architecture trained on flickr30k-fa.
 The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (RoBERTa) was initialized
 from https://huggingface.co/HooshvareLab/roberta-fa-zwnj-base .

 pipeline_tag: image-to-text
 ---
+A Persian image captioning model constructed from a ViT + RoBERTa architecture trained on [flickr30k-fa](https://www.kaggle.com/datasets/sajjadayobi360/flickrfa) (created by Sajjad Ayoubi).
 The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (RoBERTa) was initialized
 from https://huggingface.co/HooshvareLab/roberta-fa-zwnj-base .