int8 quantization with onnx runtime

#4
by Florianoli - opened

Hi,
I'm currently trying to quantize the model to int8. Somehow the sparse representation are missing in the resulting model. How did you manage to keep the sparse embeddings using int8 quantization?

Thanks!

Sign up or log in to comment