|
--- |
|
license: apache-2.0 |
|
tags: |
|
- onnx |
|
- ort |
|
--- |
|
|
|
# ONNX and ORT models with quantization of [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased) |
|
|
|
[日本語READMEはこちら](README_ja.md) |
|
|
|
This repository contains the ONNX and ORT formats of the model [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased), along with quantized versions. |
|
|
|
## License |
|
The license for this model is "apache-2.0". For details, please refer to the original model page: [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased). |
|
|
|
## Usage |
|
To use this model, install ONNX Runtime and perform inference as shown below. |
|
```python |
|
# Example code |
|
import onnxruntime as ort |
|
import numpy as np |
|
from transformers import AutoTokenizer |
|
import os |
|
|
|
# Load the tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-base-multilingual-uncased') |
|
|
|
# Prepare inputs |
|
text = 'Replace this text with your input.' |
|
inputs = tokenizer(text, return_tensors='np') |
|
|
|
# Specify the model paths |
|
# Test both the ONNX model and the ORT model |
|
model_paths = [ |
|
'onnx_models/model_opt.onnx', # ONNX model |
|
'ort_models/model.ort' # ORT format model |
|
] |
|
|
|
# Run inference with each model |
|
for model_path in model_paths: |
|
print(f'\n===== Using model: {model_path} =====') |
|
# Get the model extension |
|
model_extension = os.path.splitext(model_path)[1] |
|
|
|
# Load the model |
|
if model_extension == '.ort': |
|
# Load the ORT format model |
|
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider']) |
|
else: |
|
# Load the ONNX model |
|
session = ort.InferenceSession(model_path) |
|
|
|
# Run inference |
|
outputs = session.run(None, dict(inputs)) |
|
|
|
# Display the output shapes |
|
for idx, output in enumerate(outputs): |
|
print(f'Output {idx} shape: {output.shape}') |
|
|
|
# Display the results (add further processing if needed) |
|
print(outputs) |
|
``` |
|
|
|
## Contents of the Model |
|
This repository includes the following models: |
|
|
|
### ONNX Models |
|
- `onnx_models/model.onnx`: Original ONNX model converted from [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased) |
|
- `onnx_models/model_opt.onnx`: Optimized ONNX model |
|
- `onnx_models/model_fp16.onnx`: FP16 quantized model |
|
- `onnx_models/model_int8.onnx`: INT8 quantized model |
|
- `onnx_models/model_uint8.onnx`: UINT8 quantized model |
|
|
|
### ORT Models |
|
- `ort_models/model.ort`: ORT model using the optimized ONNX model |
|
- `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model |
|
- `ort_models/model_int8.ort`: ORT model using the INT8 quantized model |
|
- `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model |
|
|
|
## Notes |
|
Please adhere to the license and usage conditions of the original model [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased). |
|
|
|
## Contribution |
|
If you find any issues or have improvements, please create an issue or submit a pull request. |
|
|