Extremely fast forward pass
Hi all, I am using this model to extract the embeddings from some images, so I am only using the Vision Encoder, loading the encoder via "CLIPVisionModel.from_pretrained('apple/DFN5B-CLIP-ViT-H-14-378')". After some benchmarking, I have found out that the forward pass of this model is much faster than for the "openai/clip-vit-large-patch14", even though the former has more parameters. This benchmarking was done in a single Nvidia Tesla T4 GPU:
apple/DFN5B-CLIP-ViT-H-14-378: ~14 ms for 1 image
openai/clip-vit-large-patch14: ~53ms for 1 image
Does anyone know why this might be happening? I think that it might be related to the fact that this model uses quickGelu activation functions? However, I cannot see anywhere in the model card that this is the case. Thanks for your help