File size: 627 Bytes
b2a4111
969a984
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
Model	Image Captioning	Visual Question Answering	Image-Text Matching	Human Metric - Explanation of Violation	Auto Metric - Explanation of Violation	Identify - Explanation of Violation
Humans				95		92
Ground-truth Caption _ GPT3 (Oracle)				68	62	74
BLIP2 FlanT5-XXL (Fine-tuned)	177	57	84	27	24	73
BLIP2 FlanT5-XL (Fine-tuned)	174	55	81	15	18	60
Predicted Caption _ GPT3				33	42	59
BLIP2 FlanT5-XXL (Zero-shot)	120	55	71	0	0	50
CLIP ViT-L/14 (Zero-shot)			70			
OFA Large (Zero-shot)	0	38				
CoCa ViT-L-14 MSCOCO (Zero-shot)	102		72			
BLIP Large (Zero-shot)	65	39	77			
BLIP2 FlanT5-XXL (Text only FT)	2	24	94