Jina Clip V2 Encode_image latency is proportional to batch size

#33
by lssatvik - opened

I am not able to observe gpu parallelism benefits with batch size when trying to encode images to retrieve embeddings.

There is a batch size param in encode_image method. But time to process scales linearly with batch size. No point in supplying batch size param itself as Increasing batch also increases gpu memory consumed.

1 of Batch 1: 't' ms, 'm' additional gpu memory
10 of Batch 1: '10t' ms, 'm' additional gpu memory
1 of Batch 10: '10
t' ms, 'm1.2' additional gpu memory
1 of Batch 90: '90
t' ms, 'm*4' additional gpu memory

-> '10 of Batch 1' better than '1 of Batch 10'

Is this expected? Why is bigger batch size taking proportional time? Shouldn't I see some proportional reduction in time? Which aspect of the model governs this? Size of matrix? Num gpu cores?

Hey @lssatvik , thanks for reaching out! Can you share a code snippet, so I can reproduce this?

Sign up or log in to comment