jinaai
/

jina-clip-v2

Model card Files Files and versions Community

gmastrapas commited on Nov 21

Commit

6c8a548

•

1 Parent(s): ede5490

feat: add onnx runtime usage

Browse files

Files changed (1) hide show

README.md +105 -50

README.md CHANGED Viewed

@@ -151,9 +151,9 @@ Similar to our predecessor model, `jina-clip-v2` bridges the gap between text-to
 This dual capability makes it an excellent tool for multimodal retrieval-augmented generation (MuRAG) applications, enabling seamless text-to-text and text-to-image searches within a single model.
-## Data & Parameters
-[Check out our paper](https://arxiv.org/abs/2405.20204). Updated technical report for v2 coming soon!
 ## Usage
@@ -216,33 +216,32 @@ EOFEOF
 </details>
 <details>
-  <summary>via transformers:</summary>
 ```python
 # !pip install transformers einops timm pillow
 from transformers import AutoModel
 # Initialize the model
-model = AutoModel.from_pretrained("jinaai/jina-clip-v2", trust_remote_code=True)
 # Corpus
 sentences = [
-    "غروب جميل على الشاطئ", # Arabic
-    "海滩上美丽的日落", # Chinese
-    "Un beau coucher de soleil sur la plage", # French
-    "Ein wunderschöner Sonnenuntergang am Strand", # German
-    "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία", # Greek
-    "समुद्र तट पर एक खूबसूरत सूर्यास्त", # Hindi
-    "Un bellissimo tramonto sulla spiaggia", # Italian
-    "浜辺に沈む美しい夕日", # Japanese
-    "해변 위로 아름다운 일몰", # Korean
 ]
-# Public image URLs or Pil
-image_urls = ["https://i.ibb.co/nQNGqL0/beach1.jpg", "https://i.ibb.co/r5w8hG8/beach2.jpg"]
 # Choose a matryoshka dimension, set to None to get the full 1024-dim vectors
 truncate_dim = 512
@@ -250,68 +249,124 @@ truncate_dim = 512
 text_embeddings = model.encode_text(sentences, truncate_dim=truncate_dim)
 image_embeddings = model.encode_image(
     image_urls, truncate_dim=truncate_dim
-)  # also accepts PIL.image, local filenames, dataURI
 # Encode query text
-query = "beautiful sunset over the beach" # English
 query_embeddings = model.encode_text(
     query, task='retrieval.query', truncate_dim=truncate_dim
 )
-# text to image
-print("En -> Img: " + str(query_embeddings @ image_embeddings[0].T))
-# image similarity
-print("Img -> Img: " + str(image_embeddings[0] @ image_embeddings[1].T))
-# text to text
-print("En -> Ar: " + str(query_embeddings @ text_embeddings[0].T))
-print("En -> Zh: " + str(query_embeddings @ text_embeddings[1].T))
-print("En -> Fr: " + str(query_embeddings @ text_embeddings[2].T))
-print("En -> De: " + str(query_embeddings @ text_embeddings[3].T))
-print("En -> Gr: " + str(query_embeddings @ text_embeddings[4].T))
-print("En -> Hi: " + str(query_embeddings @ text_embeddings[5].T))
-print("En -> It: " + str(query_embeddings @ text_embeddings[6].T))
-print("En -> Jp: " + str(query_embeddings @ text_embeddings[7].T))
-print("En -> Ko: " + str(query_embeddings @ text_embeddings[8].T))
 ```
 </details>
 <details>
-  <summary>via sentence-transformers:</summary>
 ```python
 # !pip install sentence-transformers einops timm pillow
 from sentence_transformers import SentenceTransformer
-# Initialize the model
 truncate_dim = 512
 model = SentenceTransformer(
-    "jinaai/jina-clip-v2", trust_remote_code=True, truncate_dim=truncate_dim
 )
 # Corpus
 sentences = [
-    "غروب جميل على الشاطئ", # Arabic
-    "海滩上美丽的日落", # Chinese
-    "Un beau coucher de soleil sur la plage", # French
-    "Ein wunderschöner Sonnenuntergang am Strand", # German
-    "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία", # Greek
-    "समुद्र तट पर एक खूबसूरत सूर्यास्त", # Hindi
-    "Un bellissimo tramonto sulla spiaggia", # Italian
-    "浜辺に沈む美しい夕日", # Japanese
-    "해변 위로 아름다운 일몰", # Korean
 ]
-# Public image URLs or Pil
-image_urls = ["https://i.ibb.co/nQNGqL0/beach1.jpg", "https://i.ibb.co/r5w8hG8/beach2.jpg"]
 text_embeddings = model.encode(sentences)
-image_embeddings = model.encode(image_urls)
-query = "beautiful sunset over the beach" # English
-query_embeddings = model.encode(query)
 ```
 </details>
 ## Contact
 Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

 This dual capability makes it an excellent tool for multimodal retrieval-augmented generation (MuRAG) applications, enabling seamless text-to-text and text-to-image searches within a single model.
+## Data, Parameters, Training
+An updated version of our [technical report](https://arxiv.org/abs/2405.20204) with details on `jina-clip-v2` is coming soon. Stay tuned!
 ## Usage
 </details>
 <details>
+  <summary>via <a href="https://huggingface.co/docs/transformers/en/index">transformers</a></summary>
 ```python
 # !pip install transformers einops timm pillow
 from transformers import AutoModel
 # Initialize the model
+model = AutoModel.from_pretrained('jinaai/jina-clip-v2', trust_remote_code=True)
 # Corpus
 sentences = [
+    'غروب جميل على الشاطئ', # Arabic
+    '海滩上美丽的日落', # Chinese
+    'Un beau coucher de soleil sur la plage', # French
+    'Ein wunderschöner Sonnenuntergang am Strand', # German
+    'Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία', # Greek
+    'समुद्र तट पर एक खूबसूरत सूर्यास्त', # Hindi
+    'Un bellissimo tramonto sulla spiaggia', # Italian
+    '浜辺に沈む美しい夕日', # Japanese
+    '해변 위로 아름다운 일몰', # Korean
 ]
+# Public image URLs or PIL Images
+image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
 # Choose a matryoshka dimension, set to None to get the full 1024-dim vectors
 truncate_dim = 512
 text_embeddings = model.encode_text(sentences, truncate_dim=truncate_dim)
 image_embeddings = model.encode_image(
     image_urls, truncate_dim=truncate_dim
+)  # also accepts PIL.Image.Image, local filenames, dataURI
 # Encode query text
+query = 'beautiful sunset over the beach' # English
 query_embeddings = model.encode_text(
     query, task='retrieval.query', truncate_dim=truncate_dim
 )
+# Text to Image
+print('En -> Img: ' + str(query_embeddings @ image_embeddings[0].T))
+# Image to Image
+print('Img -> Img: ' + str(image_embeddings[0] @ image_embeddings[1].T))
+# Text to Text
+print('En -> Ar: ' + str(query_embeddings @ text_embeddings[0].T))
+print('En -> Zh: ' + str(query_embeddings @ text_embeddings[1].T))
+print('En -> Fr: ' + str(query_embeddings @ text_embeddings[2].T))
+print('En -> De: ' + str(query_embeddings @ text_embeddings[3].T))
+print('En -> Gr: ' + str(query_embeddings @ text_embeddings[4].T))
+print('En -> Hi: ' + str(query_embeddings @ text_embeddings[5].T))
+print('En -> It: ' + str(query_embeddings @ text_embeddings[6].T))
+print('En -> Jp: ' + str(query_embeddings @ text_embeddings[7].T))
+print('En -> Ko: ' + str(query_embeddings @ text_embeddings[8].T))
 ```
 </details>
 <details>
+  <summary>via <a href="https://sbert.net/">sentence-transformers</a></summary>
 ```python
 # !pip install sentence-transformers einops timm pillow
 from sentence_transformers import SentenceTransformer
+# Choose a matryoshka dimension
 truncate_dim = 512
+# Initialize the model
 model = SentenceTransformer(
+    'jinaai/jina-clip-v2', trust_remote_code=True, truncate_dim=truncate_dim
 )
 # Corpus
 sentences = [
+    'غروب جميل على الشاطئ', # Arabic
+    '海滩上美丽的日落', # Chinese
+    'Un beau coucher de soleil sur la plage', # French
+    'Ein wunderschöner Sonnenuntergang am Strand', # German
+    'Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία', # Greek
+    'समुद्र तट पर एक खूबसूरत सूर्यास्त', # Hindi
+    'Un bellissimo tramonto sulla spiaggia', # Italian
+    '浜辺に沈む美しい夕日', # Japanese
+    '해변 위로 아름다운 일몰', # Korean
 ]
+# Public image URLs or PIL Images
+image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
+# Encode text and images
 text_embeddings = model.encode(sentences)
+image_embeddings = model.encode(image_urls)  # also accepts PIL.Image.Image, local filenames, dataURI
+# Encode query text
+query = 'beautiful sunset over the beach' # English
+query_embeddings = model.encode(query, prompt_name='retrieval.query')
 ```
 </details>
+<details>
+  <summary>via the <a href="https://onnxruntime.ai/">ONNX Runtime</a></summary>
+```python
+# !pip install transformers onnxruntime pillow
+import onnxruntime as ort
+from transformers import AutoImageProcessor, AutoTokenizer
+# Load tokenizer and image processor using transformers
+tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-clip-v2', trust_remote_code=True)
+image_processor = AutoImageProcessor.from_pretrained(
+    'jinaai/jina-clip-v2', trust_remote_code=True
+)
+# Corpus
+sentences = [
+    'غروب جميل على الشاطئ', # Arabic
+    '海滩上美丽的日落', # Chinese
+    'Un beau coucher de soleil sur la plage', # French
+    'Ein wunderschöner Sonnenuntergang am Strand', # German
+    'Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία', # Greek
+    'समुद्र तट पर एक खूबसूरत सूर्यास्त', # Hindi
+    'Un bellissimo tramonto sulla spiaggia', # Italian
+    '浜辺に沈む美しい夕日', # Japanese
+    '해변 위로 아름다운 일몰', # Korean
+]
+# Public image URLs or PIL Images
+image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
+# Tokenize input texts and transform input images
+input_ids = tokenizer(sentences, return_tensors='np')['input_ids']
+pixel_values = image_processor(image_urls)['pixel_values']
+# Start an ONNX Runtime Session
+session = ort.InferenceSession('jina-clip-v2/onnx/model.onnx')
+# Run inference
+output = session.run(None, {'input_ids': input_ids, 'pixel_values': pixel_values})
+# Keep the normalised embeddings, first 2 outputs are un-normalized
+_, _, text_embeddings, image_embeddings = output
+```
+</details>
+## License
+`jina-clip-v2` is listed on AWS & Azure. If you need to use it beyond those platforms or on-premises within your company, note that the models is licensed under CC BY-NC 4.0. For commercial usage inquiries, feel free to [contact us](https://jina.ai/contact-sales/).
 ## Contact
 Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.