File size: 3,116 Bytes
288cb59 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
license: mit
tags:
- onnx
- ort
---
# ONNX and ORT models with quantization of [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased)
[日本語READMEはこちら](README_ja.md)
This repository contains the ONNX and ORT formats of the model [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased), along with quantized versions.
## License
The license for this model is "mit". For details, please refer to the original model page: [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased).
## Usage
To use this model, install ONNX Runtime and perform inference as shown below.
```python
# Example code
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
import os
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-base-german-cased')
# Prepare inputs
text = 'Replace this text with your input.'
inputs = tokenizer(text, return_tensors='np')
# Specify the model paths
# Test both the ONNX model and the ORT model
model_paths = [
'onnx_models/model_opt.onnx', # ONNX model
'ort_models/model.ort' # ORT format model
]
# Run inference with each model
for model_path in model_paths:
print(f'\n===== Using model: {model_path} =====')
# Get the model extension
model_extension = os.path.splitext(model_path)[1]
# Load the model
if model_extension == '.ort':
# Load the ORT format model
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
else:
# Load the ONNX model
session = ort.InferenceSession(model_path)
# Run inference
outputs = session.run(None, dict(inputs))
# Display the output shapes
for idx, output in enumerate(outputs):
print(f'Output {idx} shape: {output.shape}')
# Display the results (add further processing if needed)
print(outputs)
```
## Contents of the Model
This repository includes the following models:
### ONNX Models
- `onnx_models/model.onnx`: Original ONNX model converted from [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased)
- `onnx_models/model_opt.onnx`: Optimized ONNX model
- `onnx_models/model_fp16.onnx`: FP16 quantized model
- `onnx_models/model_int8.onnx`: INT8 quantized model
- `onnx_models/model_uint8.onnx`: UINT8 quantized model
### ORT Models
- `ort_models/model.ort`: ORT model using the optimized ONNX model
- `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model
- `ort_models/model_int8.ort`: ORT model using the INT8 quantized model
- `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model
## Notes
Please adhere to the license and usage conditions of the original model [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased).
## Contribution
If you find any issues or have improvements, please create an issue or submit a pull request.
|