Running
1
🖼➜🔤
Vision Encoder-Decoder for Image Captioning
What can lightweight models do?
A Comparative Evaluation of Transformer-Based Vision Encoder-Decoder Models for Brazilian Portuguese Image Captioning, by LAICSI (IFES).
What can lightweight models do?
Note Space to demonstrate the Vision Encoder-Decoder models in action
Note This Vision Encoder-Decoder (VED) is an union of Swin Transformer and DistilBERTimbau fine-tuned in Flickr30K Portuguese
Note This Vision Encoder-Decoder (VED) is an union of Swin Transformer and GPorTuguese-2 fine-tuned in Flickr30K Portuguese
Note Flickr30K Portuguese Translation with Google Translator API