File size: 3,116 Bytes
288cb59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---

license: mit
tags:
- onnx
- ort
---


# ONNX and ORT models with quantization of [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased)

[日本語READMEはこちら](README_ja.md)

This repository contains the ONNX and ORT formats of the model [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased), along with quantized versions.

## License
The license for this model is "mit". For details, please refer to the original model page: [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased).

## Usage
To use this model, install ONNX Runtime and perform inference as shown below.
```python

# Example code

import onnxruntime as ort

import numpy as np

from transformers import AutoTokenizer

import os



# Load the tokenizer

tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-base-german-cased')



# Prepare inputs

text = 'Replace this text with your input.'

inputs = tokenizer(text, return_tensors='np')



# Specify the model paths

# Test both the ONNX model and the ORT model

model_paths = [

    'onnx_models/model_opt.onnx',    # ONNX model

    'ort_models/model.ort'  # ORT format model

]



# Run inference with each model

for model_path in model_paths:

    print(f'\n===== Using model: {model_path} =====')

    # Get the model extension

    model_extension = os.path.splitext(model_path)[1]



    # Load the model

    if model_extension == '.ort':

        # Load the ORT format model

        session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])

    else:

        # Load the ONNX model

        session = ort.InferenceSession(model_path)



    # Run inference

    outputs = session.run(None, dict(inputs))



    # Display the output shapes

    for idx, output in enumerate(outputs):

        print(f'Output {idx} shape: {output.shape}')



    # Display the results (add further processing if needed)

    print(outputs)

```

## Contents of the Model
This repository includes the following models:

### ONNX Models
- `onnx_models/model.onnx`: Original ONNX model converted from [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased)
- `onnx_models/model_opt.onnx`: Optimized ONNX model
- `onnx_models/model_fp16.onnx`: FP16 quantized model
- `onnx_models/model_int8.onnx`: INT8 quantized model
- `onnx_models/model_uint8.onnx`: UINT8 quantized model

### ORT Models
- `ort_models/model.ort`: ORT model using the optimized ONNX model
- `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model
- `ort_models/model_int8.ort`: ORT model using the INT8 quantized model
- `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model

## Notes
Please adhere to the license and usage conditions of the original model [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased).

## Contribution
If you find any issues or have improvements, please create an issue or submit a pull request.