jinaai/jina-clip-v2 · Jina Clip V2 Encode_image latency is proportional to batch size

Jan 21

I am not able to observe gpu parallelism benefits with batch size when trying to encode images to retrieve embeddings.

There is a batch size param in encode_image method. But time to process scales linearly with batch size. No point in supplying batch size param itself as Increasing batch also increases gpu memory consumed.

1 of Batch 1: 't' ms, 'm' additional gpu memory
10 of Batch 1: '10t' ms, 'm' additional gpu memory
1 of Batch 10: '10t' ms, 'm1.2' additional gpu memory
1 of Batch 90: '90t' ms, 'm*4' additional gpu memory

-> '10 of Batch 1' better than '1 of Batch 10'

Is this expected? Why is bigger batch size taking proportional time? Shouldn't I see some proportional reduction in time? Which aspect of the model governs this? Size of matrix? Num gpu cores?

gmastrapas

Jina AI org Jan 21

•

edited Jan 21

Hey @lssatvik , thanks for reaching out! Can you share a code snippet, so I can reproduce this?

gmastrapas

Jina AI org Jan 27

I couldnt reproduce this, it would help to know how you are using the model cause overhead might be coming from downloading or transforming the images