ayousanz commited on
Commit
f1d9cd4
1 Parent(s): 57a4dd5

Add ONNX and ORT models with quantization

Browse files
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ ort_models/model.ort filter=lfs diff=lfs merge=lfs -text
37
+ ort_models/model_fp16.ort filter=lfs diff=lfs merge=lfs -text
38
+ ort_models/model_int8.ort filter=lfs diff=lfs merge=lfs -text
39
+ ort_models/model_uint8.ort filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - onnx
5
+ - ort
6
+ ---
7
+
8
+ # ONNX and ORT models with quantization of [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking)
9
+
10
+ [日本語READMEはこちら](README_ja.md)
11
+
12
+ This repository contains the ONNX and ORT formats of the model [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking), along with quantized versions.
13
+
14
+ ## License
15
+ The license for this model is "apache-2.0". For details, please refer to the original model page: [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking).
16
+
17
+ ## Usage
18
+ To use this model, install ONNX Runtime and perform inference as shown below.
19
+ ```python
20
+ # Example code
21
+ import onnxruntime as ort
22
+ import numpy as np
23
+ from transformers import AutoTokenizer
24
+ import os
25
+
26
+ # Load the tokenizer
27
+ tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-cased-whole-word-masking')
28
+
29
+ # Prepare inputs
30
+ text = 'Replace this text with your input.'
31
+ inputs = tokenizer(text, return_tensors='np')
32
+
33
+ # Specify the model paths
34
+ # Test both the ONNX model and the ORT model
35
+ model_paths = [
36
+ 'onnx_models/model_opt.onnx', # ONNX model
37
+ 'ort_models/model.ort' # ORT format model
38
+ ]
39
+
40
+ # Run inference with each model
41
+ for model_path in model_paths:
42
+ print(f'\n===== Using model: {model_path} =====')
43
+ # Get the model extension
44
+ model_extension = os.path.splitext(model_path)[1]
45
+
46
+ # Load the model
47
+ if model_extension == '.ort':
48
+ # Load the ORT format model
49
+ session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
50
+ else:
51
+ # Load the ONNX model
52
+ session = ort.InferenceSession(model_path)
53
+
54
+ # Run inference
55
+ outputs = session.run(None, dict(inputs))
56
+
57
+ # Display the output shapes
58
+ for idx, output in enumerate(outputs):
59
+ print(f'Output {idx} shape: {output.shape}')
60
+
61
+ # Display the results (add further processing if needed)
62
+ print(outputs)
63
+ ```
64
+
65
+ ## Contents of the Model
66
+ This repository includes the following models:
67
+
68
+ ### ONNX Models
69
+ - `onnx_models/model.onnx`: Original ONNX model converted from [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking)
70
+ - `onnx_models/model_opt.onnx`: Optimized ONNX model
71
+ - `onnx_models/model_fp16.onnx`: FP16 quantized model
72
+ - `onnx_models/model_int8.onnx`: INT8 quantized model
73
+ - `onnx_models/model_uint8.onnx`: UINT8 quantized model
74
+
75
+ ### ORT Models
76
+ - `ort_models/model.ort`: ORT model using the optimized ONNX model
77
+ - `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model
78
+ - `ort_models/model_int8.ort`: ORT model using the INT8 quantized model
79
+ - `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model
80
+
81
+ ## Notes
82
+ Please adhere to the license and usage conditions of the original model [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking).
83
+
84
+ ## Contribution
85
+ If you find any issues or have improvements, please create an issue or submit a pull request.
README_ja.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - onnx
5
+ - ort
6
+ ---
7
+
8
+ # [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking) のONNXおよびORTモデルと量子化モデル
9
+
10
+ [Click here for the English README](README.md)
11
+
12
+ このリポジトリは、元のモデル [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking) をONNXおよびORT形式に変換し、さらに量子化したものです。
13
+
14
+ ## ライセンス
15
+ このモデルのライセンスは「apache-2.0」です。詳細は元のモデルページ([google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking))を参照してください。
16
+
17
+ ## 使い方
18
+ このモデルを使用するには、ONNX Runtimeをインストールし、以下のように推論を行います。
19
+ ```python
20
+ # サンプルコード
21
+ import onnxruntime as ort
22
+ import numpy as np
23
+ from transformers import AutoTokenizer
24
+ import os
25
+
26
+ # トークナイザーの読み込み
27
+ tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-cased-whole-word-masking')
28
+
29
+ # 入力の準備
30
+ text = 'ここに入力テキストを置き換えてください。'
31
+ inputs = tokenizer(text, return_tensors='np')
32
+
33
+ # 使用するモデルのパスを指定
34
+ # ONNXモデルとORTモデルの両方をテストする
35
+ model_paths = [
36
+ 'onnx_models/model_opt.onnx', # ONNXモデル
37
+ 'ort_models/model.ort' # ORTフォーマットのモデル
38
+ ]
39
+
40
+ # モデルごとに推論を実行
41
+ for model_path in model_paths:
42
+ print(f'\n===== Using model: {model_path} =====')
43
+ # モデルの拡張子を取得
44
+ model_extension = os.path.splitext(model_path)[1]
45
+
46
+ # モデルの読み込み
47
+ if model_extension == '.ort':
48
+ # ORTフォーマットのモデルをロード
49
+ session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
50
+ else:
51
+ # ONNXモデルをロード
52
+ session = ort.InferenceSession(model_path)
53
+
54
+ # 推論の実行
55
+ outputs = session.run(None, dict(inputs))
56
+
57
+ # 出力の形状を表示
58
+ for idx, output in enumerate(outputs):
59
+ print(f'Output {idx} shape: {output.shape}')
60
+
61
+ # 結果の表示(必要に応じて処理を追加)
62
+ print(outputs)
63
+ ```
64
+
65
+ ## モデルの内容
66
+ このリポジトリには、以下のモデルが含まれています。
67
+
68
+ ### ONNXモデル
69
+ - `onnx_models/model.onnx`: [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking) から変換された元のONNXモデル
70
+ - `onnx_models/model_opt.onnx`: 最適化されたONNXモデル
71
+ - `onnx_models/model_fp16.onnx`: FP16による量子化モデル
72
+ - `onnx_models/model_int8.onnx`: INT8による量子化モデル
73
+ - `onnx_models/model_uint8.onnx`: UINT8による量子化モデル
74
+
75
+ ### ORTモデル
76
+ - `ort_models/model.ort`: 最適化されたONNXモデルを使用したORTモデル
77
+ - `ort_models/model_fp16.ort`: FP16量子化モデルを使用したORTモデル
78
+ - `ort_models/model_int8.ort`: INT8量子化モデルを使用したORTモデル
79
+ - `ort_models/model_uint8.ort`: UINT8量子化モデルを使用したORTモデル
80
+
81
+ ## 注意事項
82
+ 元のモデル [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking) のライセンスおよび使用条件を遵守してください。
83
+
84
+ ## 貢献
85
+ 問題や改善点があれば、Issueを作成するかプルリクエストを送ってください。
onnx_models/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20b82f1f41024201fe8f3d813ae267e450f43fc2447fbc757e1f91db634cfa9e
3
+ size 1334745048
onnx_models/model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a3d6701d128b7167312c5c0264489a4f63d2295fcd30d9a3d220931d135d537
3
+ size 667658248
onnx_models/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e442879cf99ae3ffa6ded66c9255c5dc706b1873e0f579a21d83752beeccc752
3
+ size 335229307
onnx_models/model_opt.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff76be0039e6a28e7d5b827647d580b96ee56e67c88d2f97b6f4ce6080848d66
3
+ size 1334693625
onnx_models/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77fbd18243bae50b6f7b53ae932f60d62f9108b304c71a8c95c78d6538b380d9
3
+ size 335229379
ort_models/model.ort ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e5731c878d686e9fc86f75e42b8cd55315481ce9286f0ea8865b2d6881b54b1
3
+ size 1335068568
ort_models/model_fp16.ort ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb696ba8b17728698c1140674eb12211cde4e1e595a97a60e3af653e33c0d0ba
3
+ size 668737720
ort_models/model_int8.ort ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd2de9ddb694ea745edd0b7b5f8b81d10d59a3e0ad1f9c858b2bd85b70068960
3
+ size 335542816
ort_models/model_uint8.ort ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b283a07077a54d42d4254c9c98accec0b89abd43bd0bf07bd0bc5ab6401de87
3
+ size 335542816