ONNX Converted Version of IBM Granite Embedding Model
This repository contains the ONNX converted version of the Hugging Face model IBM Granite Embedding 125M English.
Running the Model
You can run the ONNX model using the following code:
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
# Define paths
model_path = "./model_uint8.onnx" # Path to ONNX model file
tokenizer_path = "./" # Path to folder containing tokenizer.json and tokenizer_config.json
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
# Load ONNX model using ONNX Runtime
onnx_session = ort.InferenceSession(model_path)
# Example text input
text = "hi."
# Tokenize input
inputs = tokenizer(text, return_tensors="np", truncation=True, padding=True)
# Prepare input for ONNX model
onnx_inputs = {key: inputs[key].astype(np.int64) for key in inputs.keys()}
# Run inference
outputs = onnx_session.run(None, onnx_inputs)
# Extract embeddings (e.g., using mean pooling)
last_hidden_state = outputs[0] # Assuming the first output is the last hidden state
pooled_embedding = last_hidden_state.mean(axis=1) # Mean pooling over the sequence dimension
print(f"Embedding: {pooled_embedding}")
- Downloads last month
- 14
Model tree for rokeya71/granite-embedding-125m-english-onnx
Base model
ibm-granite/granite-embedding-125m-english