int8 quantization with onnx runtime
#4
by
Florianoli
- opened
Hi,
I'm currently trying to quantize the model to int8. Somehow the sparse representation are missing in the resulting model. How did you manage to keep the sparse embeddings using int8 quantization?
Thanks!