Jina Clip V2 Encode_image latency is proportional to batch size
I am not able to observe gpu parallelism benefits with batch size when trying to encode images to retrieve embeddings.
There is a batch size param in encode_image method. But time to process scales linearly with batch size. No point in supplying batch size param itself as Increasing batch also increases gpu memory consumed.
1 of Batch 1: 't' ms, 'm' additional gpu memory
10 of Batch 1: '10t' ms, 'm' additional gpu memory
1 of Batch 10: '10t' ms, 'm1.2' additional gpu memory
1 of Batch 90: '90t' ms, 'm*4' additional gpu memory
-> '10 of Batch 1' better than '1 of Batch 10'
Is this expected? Why is bigger batch size taking proportional time? Shouldn't I see some proportional reduction in time? Which aspect of the model governs this? Size of matrix? Num gpu cores?
Hey @lssatvik , thanks for reaching out! Can you share a code snippet, so I can reproduce this?