ONNX model output

#38
by Riddler2024 - opened

ONNX output is different from the output of Transformers and SentenceTransformer.

I checked the onnx model using netron.app, and the text_embeds, and 13049 corresponds to the sequence_output and pooled_output of the XLMRobertaModel. However, when I compared it with the model loaded from model.safetensors, I found that the results were different.

Jina AI org

Hi @Riddler2024 , have you tried running inference the way we demonstrate in the README? If there's a slight difference, it could be because ONNX uses fp32, while ST or HF may use bf16 when running in a GPU environment. If this doesn't resolve the issue, please share a code snippet so I can reproduce the behavior.

I am using the example code from the README. I extracted the onnx model output 13049 and compared it with the output of XLMRobertaModel.forwardin the xlm-roberta-flash-implementation repository, and all outputs were not normalized.

Jina AI org

@Riddler2024 , ok I see the issue. ONNX model mimics the forward function, which doesn’t apply any normalization by itself, however both HF encode and SentenceTransformers include a normalization step. This is why the outputs differ. I’d suggest applying the normalization yourself after running inference. Would that be convenient for your application?

Riddler2024 changed discussion status to closed

Sign up or log in to comment