Snowflake/snowflake-arctic-embed-m-v2.0 · Load model without "trust_remote

12 days ago

To load the model it is necessary to set "trust_remote_code=True". This makes it hard to recommend the use of this model. "Snowflake/snowflake-arctic-embed-l-v2.0" does not require this setting. Is it possible to enable loading the model without trusting remote code execution, analougous to the larger model version?

lukemerrick

Snowflake org 12 days ago

Remote code trust is required because the underlying model implementation (mGTE base) is not a part of the transformers package the way that the base model for 2.0 L (the XLMR model) is.

In particular, the remote code you are trusting is this file: https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0/blob/main/modeling_hf_alibaba_nlp_gte.py

There are several ways to deal with this trust situation:

You can pin a specific revision of the model when loading the model. This allows you to lock in a specific version which you have examined yourself.
You can include a copy of the model implementation in your code (it's Apache 2 licensed) and then load Arctic Embed 2 M using that code instead of trusting remote code (i.e. something like from my_code.copy_of_gte import GtePreTrainedModel; GtePreTrainedModel.from_pretrained('Snowflake/snowflake-arctic-ended-m-v2.0')
You can open a PR to try and get the GTE model architecture merged into transformers.

Is there a specific reason you find it hard to trust the code in this model's respository the same way you trust the code in the transformers package? It would be great to understand your needs more specifically.

T-r-y

8 days ago

Thank you for the quick answer and workarounds!
Regarding your quetion: I expect the transformers code to be better audited, due to many more people using transformers.

Snowflake
/

snowflake-arctic-embed-m-v2.0

Load model without "trust_remote_code=True"