Introduction
This model is defined as proposed in the book "mastering pytorch". It is based on CNN-encoder and a LSTM-decoder.
The CNN-encoder is based on a pretrained RESNET-152. The last layer of the resnet is replaced by a vector embedding layer of 256 elements. The LSTM-decoder use an input of 256, a hidden layer of 512, and uses the vocabulary size.
The model has been trained as a pure learning exercise, and so the model performances remain relatively mean.
Training procedure
For the sake of the exercise, the model has been trained for only 5 epochs.
It has been trained on the COCO dataset.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.