Add/update the quantized ONNX model files and README.md for Transformers.js v3

## Applied Quantizations

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ❌ Based on `encoder_model.onnx` *with* slimming

```
None
```
↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/embeddings/patch_embeddings/projection/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ❌ Based on `encoder_model.onnx` *with* slimming

```
None
```
↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/embeddings/patch_embeddings/projection/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ❌ Based on `decoder_model_merged.onnx` *with* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpc9bryt1h/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/6 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/6 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.960464477539063e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.960464477539063e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/6 [00:16<?, ?it/s]

Processing /tmp/tmpc9bryt1h/decoder_model_merged.onnx: 0%| | 0/1 [00:16<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/home/ubuntu/.cache/uv/archive-v0/7hYcxZ8pwavXeKpAYRaHY/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /decoder/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
```

### ✅ Based on `decoder_model_merged.onnx` *without* slimming

↳ ✅ `fp16`: `decoder_model_merged_fp16.onnx` (replaced because it was invalid)
↳ ✅ `int8`: `decoder_model_merged_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_merged_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_merged_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_merged_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_merged_bnb4.onnx` (added)

Files changed (23) hide show

README.md +3 -3
onnx/decoder_model_bnb4.onnx +3 -0
onnx/decoder_model_fp16.onnx +3 -0
onnx/decoder_model_int8.onnx +3 -0
onnx/decoder_model_merged_bnb4.onnx +3 -0
onnx/decoder_model_merged_fp16.onnx +2 -2
onnx/decoder_model_merged_int8.onnx +3 -0
onnx/decoder_model_merged_q4.onnx +3 -0
onnx/decoder_model_merged_q4f16.onnx +3 -0
onnx/decoder_model_merged_uint8.onnx +3 -0
onnx/decoder_model_q4.onnx +3 -0
onnx/decoder_model_q4f16.onnx +3 -0
onnx/decoder_model_uint8.onnx +3 -0
onnx/decoder_with_past_model_bnb4.onnx +3 -0
onnx/decoder_with_past_model_fp16.onnx +3 -0
onnx/decoder_with_past_model_int8.onnx +3 -0
onnx/decoder_with_past_model_q4.onnx +3 -0
onnx/decoder_with_past_model_q4f16.onnx +3 -0
onnx/decoder_with_past_model_uint8.onnx +3 -0
onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e172d32c90e3f76252360d285b13bc62c2d155040116e29e86353f5de96b4176
-size 477600499

@@ -8,15 +8,15 @@ https://huggingface.co/vikp/texify with ONNX weights to be compatible with Trans
 ## Usage (Transformers.js)
-If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@xenova/transformers) using:
 ```bash
-npm i @xenova/transformers
 ```
 **Example:** Image-to-text w/ `Xenova/texify`.
 ```js
-import { pipeline } from '@xenova/transformers';
 // Create an image-to-text pipeline
 const texify = await pipeline('image-to-text', 'Xenova/texify');

+version https://git-lfs.github.com/spec/v1
+oid sha256:399afba1b7a4b8563bfe62d44e992b46fa0f52c01c7169be6f7377ebb0a76fd8
+size 316394992

+version https://git-lfs.github.com/spec/v1
+oid sha256:f87be07b492c6c4fb9b40e0a54dc111a2073bf48bf2847a238185eb7108e136c
+size 477052468

+version https://git-lfs.github.com/spec/v1
+oid sha256:327811c9e2350576ce14541aee99831cedc45ccfae55cd02160bffc2bc575443
+size 239278866

+version https://git-lfs.github.com/spec/v1
+oid sha256:a1133b5014e24b56d225dab8e835b4ef7db5eee99cded72b4153025da8b8cba7
+size 316729008

 version https://git-lfs.github.com/spec/v1
+oid sha256:106e101f2673fff51ed2a4c5dfe0f37cf8686709bd62e00bf40f8184ab0c79e1
+size 477607346

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2f9770cc153791918e4072db1cd16be5f67a40ce20bba059d4070c311e79a8d
+size 239901396

+version https://git-lfs.github.com/spec/v1
+oid sha256:8000ed419e1feae0ab8df0878e0a42161b1d1d636a7afb563af775639fa16b79
+size 328316375

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2f3d371d42492a805517c7c1f18385083d5b2dd243cb4603d5bbb75fb82f5f4
+size 211079229

+version https://git-lfs.github.com/spec/v1
+oid sha256:beebf16bbad992f61d9226d7371c9755fbdbd061b27c1655e0f9ad43f9482a1e
+size 239901439

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:11b28c21ac09a270b9fc5bdc54f69126b038c32dcf2773bec3e15c0d024657cf
+size 327982944

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e51d958f7a3293d89a9266a930983da99bc3b8aeb8cefb4276a893b885d11c5
+size 210526536

+version https://git-lfs.github.com/spec/v1
+oid sha256:157477dad146a9465e4066183aba46f1d0179d19e16dc1496f1b34a39c5decb4
+size 239278909

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd0c82bb96ec9096ffae6b67a590fd22516594ba4b0100602bd8ce57432ae9cb
+size 306797602

+version https://git-lfs.github.com/spec/v1
+oid sha256:69f5628ab7a38d91c5c0ba807f30c716690b878fed7cef6bbec4727f1bcdd81c
+size 443379456

+version https://git-lfs.github.com/spec/v1
+oid sha256:53777d20f5f0b6bb92cb274a28d3aea6cb755f157e6cc633339b42dc9bb2b37c
+size 222327879

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d9411b8e6f31f73797a2687193ce19ec48203f4e42d8f7be9cab033cd514ac9
+size 317337106

+version https://git-lfs.github.com/spec/v1
+oid sha256:788a11f42b985d964542ab13a186f25215c797e299b1d3cc2fda4cffde0640df
+size 200968404

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6da58050fa3ab7feaffc5bc7abfa16e24df091e79850d5eaa07d69d1067bdaf
+size 222327917

+version https://git-lfs.github.com/spec/v1
+oid sha256:59c817e01db331bd2a916682f649af699acc74c1e322683c5d5569379ab9e325
+size 44444793

+version https://git-lfs.github.com/spec/v1
+oid sha256:522784f6d16a056ede03f400995733fc08ca6c84eb7b52330a90407c056035e8
+size 49064106

+version https://git-lfs.github.com/spec/v1
+oid sha256:81a9569a4c213a4bb4f63f40c304abe4551f406e41d1f7d7fc72668593dd164a
+size 43766740

+version https://git-lfs.github.com/spec/v1
+oid sha256:8764d32969e32f374203ef7a027762f905729505add9bb143bd90175aa9f8e40
+size 76913016