ONNX Conversion script

#10

by ha1772007 - opened Oct 7, 2024

Oct 7, 2024

Can you provide the script by which this model is converted to q4

Snowflake org Oct 7, 2024

•

The ONNX files were contributed without a conversion script by HuggingFace staff member @Xenova here, so you may want to ping @Xenova directly.

Oct 7, 2024

I believe he uses quantize.py, I think in particular these lines are in charge of the q4 quantization: https://github.com/xenova/transformers.js/blob/v3/scripts/quantize.py#L188-L208

P.s. are you getting good results with that quantization?

Oct 8, 2024

Yes Quantization is increasing good speed especially on CPU

comparison between float32 and float16 -> 99% similarity
comparison between float32 and int8 -> 97% similarity

I calculated Similarity on over 80+ 2000 characters long text pieces by cosine similarity

ha1772007 changed discussion status to closed Oct 8, 2024

ha1772007 changed discussion status to open Oct 8, 2024

spacemanidol changed discussion status to closed Nov 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment