Add/update the quantized ONNX model files and README.md for Transformers.js v3

## Applied Quantizations

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ❌ Based on `encoder_model.onnx` *with* slimming

```
None
```
↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/embeddings/patch_embeddings/projection/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ❌ Based on `encoder_model.onnx` *with* slimming

```
None
```
↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/embeddings/patch_embeddings/projection/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ❌ Based on `decoder_model_merged.onnx` *with* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpmif_vzn4/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/6 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/6 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.960464477539063e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.960464477539063e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/6 [00:16<?, ?it/s]

Processing /tmp/tmpmif_vzn4/decoder_model_merged.onnx: 0%| | 0/1 [00:16<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/home/ubuntu/.cache/uv/archive-v0/7hYcxZ8pwavXeKpAYRaHY/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /decoder/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
```

### ✅ Based on `decoder_model_merged.onnx` *without* slimming

↳ ✅ `fp16`: `decoder_model_merged_fp16.onnx` (replaced because it was invalid)
↳ ✅ `int8`: `decoder_model_merged_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_merged_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_merged_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_merged_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_merged_bnb4.onnx` (added)

Files changed (23) hide show

README.md +3 -3
onnx/decoder_model_bnb4.onnx +3 -0
onnx/decoder_model_fp16.onnx +3 -0
onnx/decoder_model_int8.onnx +3 -0
onnx/decoder_model_merged_bnb4.onnx +3 -0
onnx/decoder_model_merged_fp16.onnx +2 -2
onnx/decoder_model_merged_int8.onnx +3 -0
onnx/decoder_model_merged_q4.onnx +3 -0
onnx/decoder_model_merged_q4f16.onnx +3 -0
onnx/decoder_model_merged_uint8.onnx +3 -0
onnx/decoder_model_q4.onnx +3 -0
onnx/decoder_model_q4f16.onnx +3 -0
onnx/decoder_model_uint8.onnx +3 -0
onnx/decoder_with_past_model_bnb4.onnx +3 -0
onnx/decoder_with_past_model_fp16.onnx +3 -0
onnx/decoder_with_past_model_int8.onnx +3 -0
onnx/decoder_with_past_model_q4.onnx +3 -0
onnx/decoder_with_past_model_q4f16.onnx +3 -0
onnx/decoder_with_past_model_uint8.onnx +3 -0
onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a6eb87be78c6414271ae79aa77cfc9591879da8dec6b12578b9d118d5fec89ec
-size 477600499

@@ -8,15 +8,15 @@ https://huggingface.co/vikp/texify2 with ONNX weights to be compatible with Tran
 ## Usage (Transformers.js)
-If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@xenova/transformers) using:
 ```bash
-npm i @xenova/transformers
 ```
 **Example:** Image-to-text w/ `Xenova/texify2`.
 ```js
-import { pipeline } from '@xenova/transformers';
 // Create an image-to-text pipeline
 const texify = await pipeline('image-to-text', 'Xenova/texify2');

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bab90ce5bec4d6dafb65d7d7b6b3d9227e6163877adc5c3af9fcc9b549cca69
+size 316394992

+version https://git-lfs.github.com/spec/v1
+oid sha256:d58a221bc2d89d82d2618f6261d447352f79faeef46595a57ba870792c21f065
+size 477052468

+version https://git-lfs.github.com/spec/v1
+oid sha256:0044901ae20011c5f22eef9bc8f5624cd9d21985c1cc6cf0eebe451e512ffca4
+size 239278867

+version https://git-lfs.github.com/spec/v1
+oid sha256:76576cf4938e00195481174255fd8724807158ca7c1d5e493b343d6d10ad46cb
+size 316729008

 version https://git-lfs.github.com/spec/v1
+oid sha256:e8bcfb4972bfd9eb6c2dcb9fb9ec9fb2929863b32f021ee2c14557eaecff7e37
+size 477607346

+version https://git-lfs.github.com/spec/v1
+oid sha256:23bcaf7313bc706df19d147a63437af287b0ef3bb2b78484840c60b71e88c1f3
+size 239901397

+version https://git-lfs.github.com/spec/v1
+oid sha256:f7dd252e684bf1cc6ad4086a905fd0fb01e5bcde59d9dfcf1ed3bdb146a2ae3c
+size 328316375

+version https://git-lfs.github.com/spec/v1
+oid sha256:e3db2718cdbfb1b575d93b1b15714d7baeaa3cfc18d69f2e1f5fb1dcfa0d144a
+size 211079229

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c6843843931cea829b4356df7b2f57bd710fddcfa35098d3e6255e9a23ac209
+size 239901442

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:856d17825da76a53eb482b2bd30d461ce792d9eac70acdce1d2591f51dd0a08f
+size 327982944

+version https://git-lfs.github.com/spec/v1
+oid sha256:7319d162d09dc60679614d207bdde686d4826b20c567b8f509338baceb50d808
+size 210526536

+version https://git-lfs.github.com/spec/v1
+oid sha256:a5a4715c93e0ffa8c930c2cc82e8341be6b824a91b57e9a221d2327371761fb2
+size 239278912

+version https://git-lfs.github.com/spec/v1
+oid sha256:4351d55c6bfe2343cab05ae309f053b88ee3763f4cd154a9398015138955966c
+size 306797602

+version https://git-lfs.github.com/spec/v1
+oid sha256:3dd2a3b2841961a1de6d6ede7295165d52349c296b84785b9ea993fcfb4b7367
+size 443379456

+version https://git-lfs.github.com/spec/v1
+oid sha256:df3ca8e81f58aac4b013c11919939565a82525fd29de0fdc4c744f14351a979e
+size 222327880

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ff443e976ad067e3b14f8806b06409d8bc5cddf571c5d4e1c38d4daa95a125d
+size 317337106

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca4c3f9e9a234b156adb44b952fe5d6975fc814501140bde533cfc9da9844f67
+size 200968404

+version https://git-lfs.github.com/spec/v1
+oid sha256:02a8637b5fddddab2d5ff841391a792cd2e780a8de10b49c8bd67f2c019696fa
+size 222327918

+version https://git-lfs.github.com/spec/v1
+oid sha256:89d37d113bee3de55cd55039023ae38ef0440feda4741f1a8826a87ac3f2e6cc
+size 44444793

+version https://git-lfs.github.com/spec/v1
+oid sha256:1ee088e1bb329a3574e4592f4cc4c7fe4526080ed1053c4cf76fe50ea76bc2ec
+size 49064106

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc41fd244968aec06b8e15ffd6eb6264f01f8ebbd50f82ec40e1fbcf43ec08df
+size 43766740

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b718938b7d9dbe00b41a298e28781e89b906ac636256f1e5a683bc3331e5c5c
+size 76913015