File size: 3,364 Bytes

940d725

---
license: apache-2.0
tags:
- onnx
- ort
---

# ONNX and ORT models with quantization of [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad)

[日本語READMEはこちら](README_ja.md)

This repository contains the ONNX and ORT formats of the model [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad), along with quantized versions.

## License
The license for this model is "apache-2.0". For details, please refer to the original model page: [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad).

## Usage
To use this model, install ONNX Runtime and perform inference as shown below.
```python
# Example code
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
import os

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-cased-whole-word-masking-finetuned-squad')

# Prepare inputs
text = 'Replace this text with your input.'
inputs = tokenizer(text, return_tensors='np')

# Specify the model paths
# Test both the ONNX model and the ORT model
model_paths = [
    'onnx_models/model_opt.onnx',    # ONNX model
    'ort_models/model.ort'  # ORT format model
]

# Run inference with each model
for model_path in model_paths:
    print(f'\n===== Using model: {model_path} =====')
    # Get the model extension
    model_extension = os.path.splitext(model_path)[1]

    # Load the model
    if model_extension == '.ort':
        # Load the ORT format model
        session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
    else:
        # Load the ONNX model
        session = ort.InferenceSession(model_path)

    # Run inference
    outputs = session.run(None, dict(inputs))

    # Display the output shapes
    for idx, output in enumerate(outputs):
        print(f'Output {idx} shape: {output.shape}')

    # Display the results (add further processing if needed)
    print(outputs)
```

## Contents of the Model
This repository includes the following models:

### ONNX Models
- `onnx_models/model.onnx`: Original ONNX model converted from [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad)
- `onnx_models/model_opt.onnx`: Optimized ONNX model
- `onnx_models/model_fp16.onnx`: FP16 quantized model
- `onnx_models/model_int8.onnx`: INT8 quantized model
- `onnx_models/model_uint8.onnx`: UINT8 quantized model

### ORT Models
- `ort_models/model.ort`: ORT model using the optimized ONNX model
- `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model
- `ort_models/model_int8.ort`: ORT model using the INT8 quantized model
- `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model

## Notes
Please adhere to the license and usage conditions of the original model [google-bert/bert-large-cased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad).

## Contribution
If you find any issues or have improvements, please create an issue or submit a pull request.