Runtime when using onnxruntime
Environment:
onnx 1.17.0
onnxconverter-common 1.14.0
onnxruntime-gpu 1.20.1
skl2onnx 1.17.0
tf2onnx 1.16.1
CUDA Version: 12.3
Hello, I want to use jina-clip-v2 via the ONNX Runtime.
However, when I try to execute the example code, there is a RuntimeError:
RuntimeError: Input must be a list of dictionaries or a single numpy array for input 'pixel_values'.
How can I solve this?
Hey @EdisonEx33 ! Can you share a code snippet? And the error trace?
Thanks for your replying!
By adding images = [Image.open(requests.get(image_url, stream=True).raw) for image_url in image_urls]
and pixel_values = np.array(pixel_values)
, and replace /share/model/jina-clip-v2/onnx/model.onnx
with /share/model/jina-clip-v2/onnx/model_fp16.onnx
, I can run jina-clip-v2.
# !pip install transformers onnxruntime pillow
import onnxruntime as ort
from transformers import AutoImageProcessor, AutoTokenizer
import numpy as np
from PIL import Image
import requests
# Load tokenizer and image processor using transformers
tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-clip-v2', trust_remote_code=True)
image_processor = AutoImageProcessor.from_pretrained(
'jinaai/jina-clip-v2', trust_remote_code=True
)
# Corpus
sentences = [
'غروب جميل على الشاطئ', # Arabic
]
# Public image URLs or PIL Images
image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg']
# Load images from url
images = [Image.open(requests.get(image_url, stream=True).raw) for image_url in image_urls]
# Tokenize input texts and transform input images
input_ids = tokenizer(sentences, return_tensors='np')['input_ids']
pixel_values = image_processor(images)['pixel_values']
print(pixel_values.shape)
pixel_values = np.array(pixel_values)
# Start an ONNX Runtime Session
session = ort.InferenceSession('/share/model/jina-clip-v2/onnx/model_fp16.onnx')
# Run inference
from time import time
t0 = time()
output = session.run(None, {'input_ids': input_ids, 'pixel_values': pixel_values})
t1 = time()
print(f"Costs 1: {t1 - t0}")
# Keep the normalised embeddings, first 2 outputs are un-normalized
_, _, text_embeddings, image_embeddings = output
However, if I only want to get image embeddings, how can I modify the code?
Simply run output = session.run(None, {'pixel_values': pixel_values})
will get an error:
Traceback (most recent call last):
File "test_jina_clip.py", line 37, in <module>
output = session.run(None, {'pixel_values': pixel_values})
File "~/miniconda3/envs/hy_onnx/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 262, in run
self._validate_input(list(input_feed.keys()))
File "~/miniconda3/envs/hy_onnx/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 244, in _validate_input
raise ValueError(
ValueError: Required inputs (['input_ids']) are missing from input feed (['pixel_values']).
@EdisonEx33 You can run the model with zero-sized tensors as follows:
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession('model_fp16.onnx')
input_ids = np.random.randint(0, 10, (1, 16))
pixel_values = np.random.rand(0, 3, 512, 512).astype(np.float32)
print(f'{input_ids.shape=}')
print(f'{pixel_values.shape=}')
text_embeddings, image_embeddings, l2norm_text_embeddings, l2norm_image_embeddings = session.run(None, dict(input_ids=input_ids, pixel_values=pixel_values))
Note pixel_values
has shape (0, 3, 512, 512)