qaihm-bot commited on
Commit
5291c19
·
verified ·
1 Parent(s): e42b3fc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +70 -28
README.md CHANGED
@@ -14,7 +14,7 @@ tags:
14
 
15
  End-to-end text recognition approach with pre-trained image transformer and text transformer models for both image understanding and wordpiece-level text generation.
16
 
17
- This model is an implementation of TrOCR found [here](https://huggingface.co/microsoft/trocr-small-stage1).
18
  This repository provides scripts to run TrOCR on Qualcomm® devices.
19
  More details on model performance across various devices, can be found
20
  [here](https://aihub.qualcomm.com/models/trocr).
@@ -31,17 +31,53 @@ More details on model performance across various devices, can be found
31
  - Number of parameters (TrOCRDecoder): 38.3M
32
  - Model size (TrOCRDecoder): 146 MB
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
 
36
 
37
- | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
38
- | ---|---|---|---|---|---|---|---|
39
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | TFLite | 66.632 ms | 7 - 9 MB | FP16 | NPU | [TrOCREncoder.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite)
40
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | TFLite | 2.71 ms | 0 - 2 MB | FP16 | NPU | [TrOCRDecoder.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite)
41
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Model Library | 68.321 ms | 0 - 21 MB | FP16 | NPU | [TrOCREncoder.so](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.so)
42
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Model Library | 3.068 ms | 0 - 270 MB | FP16 | NPU | [TrOCRDecoder.so](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.so)
43
-
44
-
45
 
46
  ## Installation
47
 
@@ -97,23 +133,25 @@ device. This script does the following:
97
  ```bash
98
  python -m qai_hub_models.models.trocr.export
99
  ```
100
-
101
  ```
102
- Profile Job summary of TrOCREncoder
103
- --------------------------------------------------
104
- Device: Snapdragon X Elite CRD (11)
105
- Estimated Inference Time: 47.46 ms
106
- Estimated Peak Memory Range: 1.70-1.70 MB
107
- Compute Units: NPU (443) | Total (443)
108
-
109
- Profile Job summary of TrOCRDecoder
110
- --------------------------------------------------
111
- Device: Snapdragon X Elite CRD (11)
112
- Estimated Inference Time: 3.02 ms
113
- Estimated Peak Memory Range: 7.05-7.05 MB
114
- Compute Units: NPU (356) | Total (356)
115
-
116
-
 
 
 
117
  ```
118
 
119
 
@@ -241,15 +279,19 @@ provides instructions on how to use the `.so` shared library in an Android appl
241
  Get more details on TrOCR's performance across various devices [here](https://aihub.qualcomm.com/models/trocr).
242
  Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
243
 
 
244
  ## License
245
- - The license for the original implementation of TrOCR can be found
246
- [here](https://github.com/microsoft/unilm/blob/master/LICENSE).
247
- - The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf)
 
248
 
249
  ## References
250
  * [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282)
251
  * [Source Model Implementation](https://huggingface.co/microsoft/trocr-small-stage1)
252
 
 
 
253
  ## Community
254
  * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI.
255
  * For questions or feedback please [reach out to us](mailto:[email protected]).
 
14
 
15
  End-to-end text recognition approach with pre-trained image transformer and text transformer models for both image understanding and wordpiece-level text generation.
16
 
17
+ This model is an implementation of TrOCR found [here]({source_repo}).
18
  This repository provides scripts to run TrOCR on Qualcomm® devices.
19
  More details on model performance across various devices, can be found
20
  [here](https://aihub.qualcomm.com/models/trocr).
 
31
  - Number of parameters (TrOCRDecoder): 38.3M
32
  - Model size (TrOCRDecoder): 146 MB
33
 
34
+ | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
35
+ |---|---|---|---|---|---|---|---|---|
36
+ | TrOCREncoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 50.652 ms | 7 - 9 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite) |
37
+ | TrOCREncoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | QNN | 52.89 ms | 0 - 22 MB | FP16 | NPU | [TrOCR.so](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.so) |
38
+ | TrOCREncoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 39.309 ms | 0 - 178 MB | FP16 | NPU | [TrOCR.onnx](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.onnx) |
39
+ | TrOCREncoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 40.349 ms | 5 - 306 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite) |
40
+ | TrOCREncoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | QNN | 42.073 ms | 2 - 64 MB | FP16 | NPU | [TrOCR.so](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.so) |
41
+ | TrOCREncoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 31.228 ms | 0 - 348 MB | FP16 | NPU | [TrOCR.onnx](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.onnx) |
42
+ | TrOCREncoder | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 50.061 ms | 7 - 8 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite) |
43
+ | TrOCREncoder | QCS8550 (Proxy) | QCS8550 Proxy | QNN | 36.086 ms | 2 - 3 MB | FP16 | NPU | Use Export Script |
44
+ | TrOCREncoder | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 50.179 ms | 7 - 9 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite) |
45
+ | TrOCREncoder | SA8255 (Proxy) | SA8255P Proxy | QNN | 36.899 ms | 2 - 4 MB | FP16 | NPU | Use Export Script |
46
+ | TrOCREncoder | SA8775 (Proxy) | SA8775P Proxy | TFLITE | 51.951 ms | 7 - 9 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite) |
47
+ | TrOCREncoder | SA8775 (Proxy) | SA8775P Proxy | QNN | 37.124 ms | 2 - 3 MB | FP16 | NPU | Use Export Script |
48
+ | TrOCREncoder | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 51.056 ms | 7 - 9 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite) |
49
+ | TrOCREncoder | SA8650 (Proxy) | SA8650P Proxy | QNN | 37.072 ms | 2 - 7 MB | FP16 | NPU | Use Export Script |
50
+ | TrOCREncoder | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 60.938 ms | 7 - 296 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite) |
51
+ | TrOCREncoder | QCS8450 (Proxy) | QCS8450 Proxy | QNN | 60.192 ms | 2 - 64 MB | FP16 | NPU | Use Export Script |
52
+ | TrOCREncoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 36.174 ms | 3 - 119 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.tflite) |
53
+ | TrOCREncoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | QNN | 33.016 ms | 2 - 66 MB | FP16 | NPU | Use Export Script |
54
+ | TrOCREncoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 23.693 ms | 6 - 207 MB | FP16 | NPU | [TrOCR.onnx](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.onnx) |
55
+ | TrOCREncoder | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 33.885 ms | 2 - 2 MB | FP16 | NPU | Use Export Script |
56
+ | TrOCREncoder | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 35.659 ms | 109 - 109 MB | FP16 | NPU | [TrOCR.onnx](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCREncoder.onnx) |
57
+ | TrOCRDecoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 2.6 ms | 0 - 2 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite) |
58
+ | TrOCRDecoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | QNN | 3.012 ms | 3 - 263 MB | FP16 | NPU | [TrOCR.so](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.so) |
59
+ | TrOCRDecoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 2.843 ms | 1 - 3 MB | FP16 | NPU | [TrOCR.onnx](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.onnx) |
60
+ | TrOCRDecoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 1.851 ms | 0 - 190 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite) |
61
+ | TrOCRDecoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | QNN | 2.471 ms | 0 - 51 MB | FP16 | NPU | [TrOCR.so](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.so) |
62
+ | TrOCRDecoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 2.148 ms | 0 - 149 MB | FP16 | NPU | [TrOCR.onnx](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.onnx) |
63
+ | TrOCRDecoder | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 2.562 ms | 0 - 2 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite) |
64
+ | TrOCRDecoder | QCS8550 (Proxy) | QCS8550 Proxy | QNN | 2.631 ms | 0 - 1 MB | FP16 | NPU | Use Export Script |
65
+ | TrOCRDecoder | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 2.608 ms | 0 - 2 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite) |
66
+ | TrOCRDecoder | SA8255 (Proxy) | SA8255P Proxy | QNN | 2.607 ms | 1 - 3 MB | FP16 | NPU | Use Export Script |
67
+ | TrOCRDecoder | SA8775 (Proxy) | SA8775P Proxy | TFLITE | 2.604 ms | 0 - 2 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite) |
68
+ | TrOCRDecoder | SA8775 (Proxy) | SA8775P Proxy | QNN | 2.613 ms | 1 - 3 MB | FP16 | NPU | Use Export Script |
69
+ | TrOCRDecoder | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 2.573 ms | 0 - 2 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite) |
70
+ | TrOCRDecoder | SA8650 (Proxy) | SA8650P Proxy | QNN | 2.658 ms | 1 - 3 MB | FP16 | NPU | Use Export Script |
71
+ | TrOCRDecoder | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 2.814 ms | 0 - 189 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite) |
72
+ | TrOCRDecoder | QCS8450 (Proxy) | QCS8450 Proxy | QNN | 3.375 ms | 4 - 52 MB | FP16 | NPU | Use Export Script |
73
+ | TrOCRDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 2.104 ms | 0 - 27 MB | FP16 | NPU | [TrOCR.tflite](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.tflite) |
74
+ | TrOCRDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | QNN | 2.016 ms | 0 - 45 MB | FP16 | NPU | Use Export Script |
75
+ | TrOCRDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 2.078 ms | 0 - 35 MB | FP16 | NPU | [TrOCR.onnx](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.onnx) |
76
+ | TrOCRDecoder | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 2.793 ms | 7 - 7 MB | FP16 | NPU | Use Export Script |
77
+ | TrOCRDecoder | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 2.881 ms | 68 - 68 MB | FP16 | NPU | [TrOCR.onnx](https://huggingface.co/qualcomm/TrOCR/blob/main/TrOCRDecoder.onnx) |
78
 
79
 
80
 
 
 
 
 
 
 
 
 
81
 
82
  ## Installation
83
 
 
133
  ```bash
134
  python -m qai_hub_models.models.trocr.export
135
  ```
 
136
  ```
137
+ Profiling Results
138
+ ------------------------------------------------------------
139
+ TrOCREncoder
140
+ Device : Samsung Galaxy S23 (13)
141
+ Runtime : TFLITE
142
+ Estimated inference time (ms) : 50.7
143
+ Estimated peak memory usage (MB): [7, 9]
144
+ Total # Ops : 591
145
+ Compute Unit(s) : NPU (591 ops)
146
+
147
+ ------------------------------------------------------------
148
+ TrOCRDecoder
149
+ Device : Samsung Galaxy S23 (13)
150
+ Runtime : TFLITE
151
+ Estimated inference time (ms) : 2.6
152
+ Estimated peak memory usage (MB): [0, 2]
153
+ Total # Ops : 399
154
+ Compute Unit(s) : NPU (399 ops)
155
  ```
156
 
157
 
 
279
  Get more details on TrOCR's performance across various devices [here](https://aihub.qualcomm.com/models/trocr).
280
  Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
281
 
282
+
283
  ## License
284
+ * The license for the original implementation of TrOCR can be found [here](https://github.com/microsoft/unilm/blob/master/LICENSE).
285
+ * The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf)
286
+
287
+
288
 
289
  ## References
290
  * [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282)
291
  * [Source Model Implementation](https://huggingface.co/microsoft/trocr-small-stage1)
292
 
293
+
294
+
295
  ## Community
296
  * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI.
297
  * For questions or feedback please [reach out to us](mailto:[email protected]).