ONNX Conversion script

#10
by ha1772007 - opened

Can you provide the script by which this model is converted to q4

The ONNX files were contributed without a conversion script by HuggingFace staff member @Xenova here, so you may want to ping @Xenova directly.

I believe he uses quantize.py, I think in particular these lines are in charge of the q4 quantization: https://github.com/xenova/transformers.js/blob/v3/scripts/quantize.py#L188-L208

P.s. are you getting good results with that quantization?

Yes Quantization is increasing good speed especially on CPU

comparison between float32 and float16 -> 99% similarity
comparison between float32 and int8 -> 97% similarity

I calculated Similarity on over 80+ 2000 characters long text pieces by cosine similarity

ha1772007 changed discussion status to closed
ha1772007 changed discussion status to open

Sign up or log in to comment