Add ONNX and ORT models with quantization

Browse files

Files changed (12) hide show

.gitattributes +4 -0
README.md +85 -0
README_ja.md +85 -0
onnx_models/model.onnx +3 -0
onnx_models/model_fp16.onnx +3 -0
onnx_models/model_int8.onnx +3 -0
onnx_models/model_opt.onnx +3 -0
onnx_models/model_uint8.onnx +3 -0
ort_models/model.ort +3 -0
ort_models/model_fp16.ort +3 -0
ort_models/model_int8.ort +3 -0
ort_models/model_uint8.ort +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+ort_models/model.ort filter=lfs diff=lfs merge=lfs -text
+ort_models/model_fp16.ort filter=lfs diff=lfs merge=lfs -text
+ort_models/model_int8.ort filter=lfs diff=lfs merge=lfs -text
+ort_models/model_uint8.ort filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,85 @@

+---
+license: apache-2.0
+tags:
+- onnx
+- ort
+---
+# ONNX and ORT models with quantization of [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad)
+[日本語READMEはこちら](README_ja.md)
+This repository contains the ONNX and ORT formats of the model [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad), along with quantized versions.
+## License
+The license for this model is "apache-2.0". For details, please refer to the original model page: [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
+## Usage
+To use this model, install ONNX Runtime and perform inference as shown below.
+```python
+# Example code
+import onnxruntime as ort
+import numpy as np
+from transformers import AutoTokenizer
+import os
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-uncased-whole-word-masking-finetuned-squad')
+# Prepare inputs
+text = 'Replace this text with your input.'
+inputs = tokenizer(text, return_tensors='np')
+# Specify the model paths
+# Test both the ONNX model and the ORT model
+model_paths = [
+    'onnx_models/model_opt.onnx',    # ONNX model
+    'ort_models/model.ort'  # ORT format model
+]
+# Run inference with each model
+for model_path in model_paths:
+    print(f'\n===== Using model: {model_path} =====')
+    # Get the model extension
+    model_extension = os.path.splitext(model_path)[1]
+    # Load the model
+    if model_extension == '.ort':
+        # Load the ORT format model
+        session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
+    else:
+        # Load the ONNX model
+        session = ort.InferenceSession(model_path)
+    # Run inference
+    outputs = session.run(None, dict(inputs))
+    # Display the output shapes
+    for idx, output in enumerate(outputs):
+        print(f'Output {idx} shape: {output.shape}')
+    # Display the results (add further processing if needed)
+    print(outputs)
+```
+## Contents of the Model
+This repository includes the following models:
+### ONNX Models
+- `onnx_models/model.onnx`: Original ONNX model converted from [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad)
+- `onnx_models/model_opt.onnx`: Optimized ONNX model
+- `onnx_models/model_fp16.onnx`: FP16 quantized model
+- `onnx_models/model_int8.onnx`: INT8 quantized model
+- `onnx_models/model_uint8.onnx`: UINT8 quantized model
+### ORT Models
+- `ort_models/model.ort`: ORT model using the optimized ONNX model
+- `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model
+- `ort_models/model_int8.ort`: ORT model using the INT8 quantized model
+- `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model
+## Notes
+Please adhere to the license and usage conditions of the original model [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
+## Contribution
+If you find any issues or have improvements, please create an issue or submit a pull request.

README_ja.md ADDED Viewed

	@@ -0,0 +1,85 @@

+---
+license: apache-2.0
+tags:
+- onnx
+- ort
+---
+# [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad) のONNXおよびORTモデルと量子化モデル
+[Click here for the English README](README.md)
+このリポジトリは、元のモデル [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad) をONNXおよびORT形式に変換し、さらに量子化したものです。
+## ライセンス
+このモデルのライセンスは「apache-2.0」です。詳細は元のモデルページ（[google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad)）を参照してください。
+## 使い方
+このモデルを使用するには、ONNX Runtimeをインストールし、以下のように推論を行います。
+```python
+# サンプルコード
+import onnxruntime as ort
+import numpy as np
+from transformers import AutoTokenizer
+import os
+# トークナイザーの読み込み
+tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-uncased-whole-word-masking-finetuned-squad')
+# 入力の準備
+text = 'ここに入力テキストを置き換えてください。'
+inputs = tokenizer(text, return_tensors='np')
+# 使用するモデルのパスを指定
+# ONNXモデルとORTモデルの両方をテストする
+model_paths = [
+    'onnx_models/model_opt.onnx',    # ONNXモデル
+    'ort_models/model.ort'  # ORTフォーマットのモデル
+]
+# モデルごとに推論を実行
+for model_path in model_paths:
+    print(f'\n===== Using model: {model_path} =====')
+    # モデルの拡張子を取得
+    model_extension = os.path.splitext(model_path)[1]
+    # モデルの読み込み
+    if model_extension == '.ort':
+        # ORTフォーマットのモデルをロード
+        session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
+    else:
+        # ONNXモデルをロード
+        session = ort.InferenceSession(model_path)
+    # 推論の実行
+    outputs = session.run(None, dict(inputs))
+    # 出力の形状を表示
+    for idx, output in enumerate(outputs):
+        print(f'Output {idx} shape: {output.shape}')
+    # 結果の表示（必要に応じて処理を追加）
+    print(outputs)
+```
+## モデルの内容
+このリポジトリには、以下のモデルが含まれています。
+### ONNXモデル
+- `onnx_models/model.onnx`: [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad) から変換された元のONNXモデル
+- `onnx_models/model_opt.onnx`: 最適化されたONNXモデル
+- `onnx_models/model_fp16.onnx`: FP16による量子化モデル
+- `onnx_models/model_int8.onnx`: INT8による量子化モデル
+- `onnx_models/model_uint8.onnx`: UINT8による量子化モデル
+### ORTモデル
+- `ort_models/model.ort`: 最適化されたONNXモデルを使用したORTモデル
+- `ort_models/model_fp16.ort`: FP16量子化モデルを使用したORTモデル
+- `ort_models/model_int8.ort`: INT8量子化モデルを使用したORTモデル
+- `ort_models/model_uint8.ort`: UINT8量子化モデルを使用したORTモデル
+## 注意事項
+元のモデル [google-bert/bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad) のライセンスおよび使用条件を遵守してください。
+## 貢献
+問題や改善点があれば、Issueを作成するかプルリクエストを送ってください。

onnx_models/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d25f8217b8877a253a602a22b94b6e5fff7a3bcbc8ee61fe91b368d0fc8a0ce4
+size 1340995544

onnx_models/model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e2349f21ec01f7c90729cd54452e28d1a3aa7f2d7f8dfd91b437fb4fef022da
+size 670783496

onnx_models/model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe79991f4bf5aa1dd07ca36edf6deb4fc86a09dac9682f0a751943d78d7bfa4d
+size 336791930

onnx_models/model_opt.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4747d76b753d28e50bfee5d96de9fd7408fcf7e0340a96bffa9a260a90c06d2
+size 1340944121

onnx_models/model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0723656cd81b24345b08ad041a71c096445df62893e80c7a147dba1cb5f62e39
+size 336792017

ort_models/model.ort ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf9022e142166ac7a2285540ed29af021d6471a81bfd2d6451588153e738967d
+size 1341319064

ort_models/model_fp16.ort ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c453fdd900bb436b71556784e8654efc15f5db473506fd6ef2996940c0ef5c4
+size 671862976

ort_models/model_int8.ort ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff21a9390035d16bff29c703bff494a41a5fcbadc1cbbea96f15910f1e433d8b
+size 337105440

ort_models/model_uint8.ort ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc8d35143f868897886417d2b82f1ed1d55bb8eb4903da342bb8b0a13ae613a1
+size 337105440