ayousanz's picture
Add ONNX and ORT models with quantization
940d725 verified
---
license: apache-2.0
tags:
- onnx
- ort
---
# ONNX and ORT models with quantization of [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad)
[日本語READMEはこちら](README_ja.md)
This repository contains the ONNX and ORT formats of the model [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad), along with quantized versions.
## License
The license for this model is "apache-2.0". For details, please refer to the original model page: [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad).
## Usage
To use this model, install ONNX Runtime and perform inference as shown below.
```python
# Example code
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
import os
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-cased-whole-word-masking-finetuned-squad')
# Prepare inputs
text = 'Replace this text with your input.'
inputs = tokenizer(text, return_tensors='np')
# Specify the model paths
# Test both the ONNX model and the ORT model
model_paths = [
'onnx_models/model_opt.onnx', # ONNX model
'ort_models/model.ort' # ORT format model
]
# Run inference with each model
for model_path in model_paths:
print(f'\n===== Using model: {model_path} =====')
# Get the model extension
model_extension = os.path.splitext(model_path)[1]
# Load the model
if model_extension == '.ort':
# Load the ORT format model
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
else:
# Load the ONNX model
session = ort.InferenceSession(model_path)
# Run inference
outputs = session.run(None, dict(inputs))
# Display the output shapes
for idx, output in enumerate(outputs):
print(f'Output {idx} shape: {output.shape}')
# Display the results (add further processing if needed)
print(outputs)
```
## Contents of the Model
This repository includes the following models:
### ONNX Models
- `onnx_models/model.onnx`: Original ONNX model converted from [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad)
- `onnx_models/model_opt.onnx`: Optimized ONNX model
- `onnx_models/model_fp16.onnx`: FP16 quantized model
- `onnx_models/model_int8.onnx`: INT8 quantized model
- `onnx_models/model_uint8.onnx`: UINT8 quantized model
### ORT Models
- `ort_models/model.ort`: ORT model using the optimized ONNX model
- `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model
- `ort_models/model_int8.ort`: ORT model using the INT8 quantized model
- `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model
## Notes
Please adhere to the license and usage conditions of the original model [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad).
## Contribution
If you find any issues or have improvements, please create an issue or submit a pull request.