Vit-GPT2-COCO2017Flickr-01

This model is a fine-tuned version of NourFakih/image-captioning-Vit-GPT2-Flickr8k on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2789
Rouge1: 40.4777
Rouge2: 15.156
Rougel: 36.8755
Rougelsum: 36.8813
Gen Len: 11.92

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Gen Len	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
0.2185	0.08	500	11.9627	0.2288	41.2368	15.6218	37.5796	37.5754
0.2097	0.15	1000	12.1819	0.2266	41.0126	15.773	37.2736	37.2843
0.2067	0.23	1500	11.1865	0.2260	41.0707	15.534	37.4934	37.5044
0.1997	0.31	2000	11.4404	0.2251	41.5488	15.8208	37.704	37.7153
0.1962	0.38	2500	12.1219	0.2241	41.6067	16.1235	37.8372	37.8403
0.1891	0.46	3000	12.0462	0.2246	41.7488	16.5323	38.0498	38.0689
0.1942	0.54	3500	11.8842	0.2252	41.3542	15.7955	37.8567	37.8759
0.186	0.62	4000	11.6954	0.2256	41.4582	15.8671	37.7381	37.7557
0.1822	0.69	4500	11.6962	0.2253	41.6779	15.8426	37.9166	37.9538
0.1829	0.77	5000	11.695	0.2248	41.8987	16.4174	38.3064	38.321
0.1786	0.85	5500	11.9762	0.2251	40.9742	15.6616	37.3227	37.3401
0.1808	0.92	6000	11.7042	0.2260	41.5023	16.0289	37.9925	37.9843
0.1758	1.0	6500	11.8888	0.2262	41.3528	16.0559	37.8786	37.8588
0.1326	1.08	7000	11.8173	0.2394	40.7818	15.486	37.2677	37.2794
0.1291	1.15	7500	11.7969	0.2412	41.4117	16.2382	37.9863	37.9964
0.1314	1.23	8000	11.7969	0.2436	41.1586	15.5594	37.512	37.5293
0.131	1.31	8500	11.8281	0.2427	41.1027	15.817	37.7167	37.7216
0.1322	1.38	9000	11.8927	0.2400	41.4453	16.0873	37.7242	37.735
0.1237	1.46	9500	11.8035	0.2447	40.704	15.0054	37.1021	37.1102
0.1289	1.54	10000	12.2473	0.2441	41.0159	15.5793	37.1366	37.1673
0.1236	1.62	10500	11.6977	0.2452	40.8137	15.3874	37.1591	37.1672
0.1241	1.69	11000	11.4181	0.2465	40.9985	15.3879	37.1388	37.1634
0.1219	1.77	11500	11.7765	0.2463	41.1345	15.6654	37.3921	37.4082
0.1234	1.85	12000	12.1512	0.2444	41.134	15.7004	37.3621	37.3993
0.1193	1.92	12500	11.6831	0.2466	40.568	15.1806	37.0715	37.0779
0.1148	2.0	13000	11.6546	0.2482	41.0991	15.4567	37.4898	37.5136
0.0836	2.08	13500	12.0708	0.2717	40.4842	15.0195	36.8428	36.859
0.0869	2.15	14000	12.0069	0.2731	40.6828	14.8559	36.8299	36.8515
0.0846	2.23	14500	12.02	0.2727	40.1785	14.8884	36.7155	36.7025
0.0829	2.31	15000	12.0535	0.2756	40.9047	15.2085	37.1447	37.1153
0.0855	2.38	15500	12.0346	0.2757	40.8628	14.9646	37.068	37.0583
0.0859	2.46	16000	11.8796	0.2762	40.924	15.2223	37.1443	37.1329
0.0847	2.54	16500	11.9292	0.2786	40.9447	15.2269	37.1398	37.1511
0.0831	2.62	17000	12.0958	0.2770	40.417	14.7542	36.6568	36.6345
0.0828	2.69	17500	11.845	0.2796	40.7295	15.0389	36.9957	36.9706
0.0782	2.77	18000	11.9369	0.2796	40.7406	15.1238	36.9906	36.9817
0.0798	2.85	18500	11.9869	0.2792	40.4692	15.0458	36.8005	36.7953
0.0794	2.92	19000	11.8985	0.2792	40.497	15.1883	36.8923	36.8945
0.0793	3.0	19500	11.92	0.2789	40.4777	15.156	36.8755	36.8813

Framework versions

Transformers 4.39.3
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.15.2

NourFakih
/

Vit-GPT2-COCO2017Flickr-01

Vit-GPT2-COCO2017Flickr-01

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-01

Space using NourFakih/Vit-GPT2-COCO2017Flickr-01 1

Evaluation results