|
## Usage |
|
Similar usage to jina-clip-v1, the only difference is that it can use matryoshka embeddings, through the truncate_dim argument on the encode_text and encode_image features |
|
```python |
|
!pip install transformers einops timm pillow |
|
from transformers import AutoModel |
|
|
|
# Initialize the model |
|
model = AutoModel.from_pretrained('jinaai/jina-clip-v2-test', trust_remote_code=True) |
|
|
|
# New meaningful sentences |
|
sentences = ['A blue cat', 'A red cat'] |
|
|
|
# Public image URLs |
|
image_urls = [ |
|
'https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg', |
|
'https://i.pinimg.com/736x/c9/f2/3e/c9f23e212529f13f19bad5602d84b78b.jpg' |
|
] |
|
|
|
# Encode text and images |
|
truncate = 512 |
|
text_embeddings = model.encode_text(sentences, truncate_dim = truncate) |
|
image_embeddings = model.encode_image(image_urls, truncate_dim = truncate) # also accepts PIL.image, local filenames, dataURI |
|
|
|
# Compute similarities |
|
print(text_embeddings[0] @ text_embeddings[1].T) # text embedding similarity |
|
print(text_embeddings[0] @ image_embeddings[0].T) # text-image cross-modal similarity |
|
print(text_embeddings[0] @ image_embeddings[1].T) # text-image cross-modal similarity |
|
print(text_embeddings[1] @ image_embeddings[0].T) # text-image cross-modal similarity |
|
print(text_embeddings[1] @ image_embeddings[1].T)# text-image cross-modal similarity |
|
``` |
|
|