|
---
|
|
license: apache-2.0
|
|
tags:
|
|
- onnx
|
|
- ort
|
|
---
|
|
|
|
# ONNX and ORT models with quantization of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large)
|
|
|
|
[日本語READMEはこちら](README_ja.md)
|
|
|
|
This repository contains the ONNX and ORT formats of the model [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large), along with quantized versions.
|
|
|
|
## License
|
|
The license for this model is "apache-2.0". For details, please refer to the original model page: [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large).
|
|
|
|
## Usage
|
|
To use this model, install ONNX Runtime and perform inference as shown below.
|
|
```python
|
|
# Example code
|
|
import onnxruntime as ort
|
|
import numpy as np
|
|
from transformers import AutoTokenizer
|
|
import os
|
|
|
|
# Load the tokenizer
|
|
tokenizer = AutoTokenizer.from_pretrained('answerdotai/ModernBERT-large')
|
|
|
|
# Prepare inputs
|
|
text = 'Replace this text with your input.'
|
|
inputs = tokenizer(text, return_tensors='np')
|
|
|
|
# Specify the model paths
|
|
# Test both the ONNX model and the ORT model
|
|
model_paths = [
|
|
'onnx_models/model_opt.onnx', # ONNX model
|
|
'ort_models/model.ort' # ORT format model
|
|
]
|
|
|
|
# Run inference with each model
|
|
for model_path in model_paths:
|
|
print(f'\n===== Using model: {model_path} =====')
|
|
# Get the model extension
|
|
model_extension = os.path.splitext(model_path)[1]
|
|
|
|
# Load the model
|
|
if model_extension == '.ort':
|
|
# Load the ORT format model
|
|
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
|
|
else:
|
|
# Load the ONNX model
|
|
session = ort.InferenceSession(model_path)
|
|
|
|
# Run inference
|
|
outputs = session.run(None, dict(inputs))
|
|
|
|
# Display the output shapes
|
|
for idx, output in enumerate(outputs):
|
|
print(f'Output {idx} shape: {output.shape}')
|
|
|
|
# Display the results (add further processing if needed)
|
|
print(outputs)
|
|
```
|
|
|
|
## Contents of the Model
|
|
This repository includes the following models:
|
|
|
|
### ONNX Models
|
|
- `onnx_models/model.onnx`: Original ONNX model converted from [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large)
|
|
- `onnx_models/model_opt.onnx`: Optimized ONNX model
|
|
- `onnx_models/model_fp16.onnx`: FP16 quantized model
|
|
- `onnx_models/model_int8.onnx`: INT8 quantized model
|
|
- `onnx_models/model_uint8.onnx`: UINT8 quantized model
|
|
|
|
### ORT Models
|
|
- `ort_models/model.ort`: ORT model using the optimized ONNX model
|
|
- `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model
|
|
- `ort_models/model_int8.ort`: ORT model using the INT8 quantized model
|
|
- `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model
|
|
|
|
## Notes
|
|
Please adhere to the license and usage conditions of the original model [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large).
|
|
|
|
## Contribution
|
|
If you find any issues or have improvements, please create an issue or submit a pull request.
|
|
|