File size: 696 Bytes
74fa892 a9f4d38 74fa892 7461ae2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
---
license: cc-by-nc-nd-4.0
tags:
- Image
- Captionning
- RESNET-152
- LSTM
---
## Introduction
This model is defined as proposed in the book "mastering pytorch".
It is based on CNN-encoder and a LSTM-decoder.
The CNN-encoder is based on a pretrained RESNET-152. The last layer of the resnet is replaced by a vector embedding layer of 256 elements.
The LSTM-decoder use an input of 256, a hidden layer of 512, and uses the vocabulary size.
The model has been trained as a pure learning exercise, and so the model performances remain relatively mean.
## Training procedure
For the sake of the exercise, the model has been trained for only 5 epochs.
It has been trained on the COCO dataset. |