## Usage Similar usage to jina-clip-v1, the only difference is that it can use matryoshka embeddings, through the truncate_dim argument on the encode_text and encode_image features ```python !pip install transformers einops timm pillow from transformers import AutoModel # Initialize the model model = AutoModel.from_pretrained('jinaai/jina-clip-v2-test', trust_remote_code=True) # New meaningful sentences sentences = ['A blue cat', 'A red cat'] # Public image URLs image_urls = [ 'https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg', 'https://i.pinimg.com/736x/c9/f2/3e/c9f23e212529f13f19bad5602d84b78b.jpg' ] # Encode text and images truncate = 512 text_embeddings = model.encode_text(sentences, truncate_dim = truncate) image_embeddings = model.encode_image(image_urls, truncate_dim = truncate) # also accepts PIL.image, local filenames, dataURI # Compute similarities print(text_embeddings[0] @ text_embeddings[1].T) # text embedding similarity print(text_embeddings[0] @ image_embeddings[0].T) # text-image cross-modal similarity print(text_embeddings[0] @ image_embeddings[1].T) # text-image cross-modal similarity print(text_embeddings[1] @ image_embeddings[0].T) # text-image cross-modal similarity print(text_embeddings[1] @ image_embeddings[1].T)# text-image cross-modal similarity ```