Load model without "trust_remote_code=True"
To load the model it is necessary to set "trust_remote_code=True". This makes it hard to recommend the use of this model. "Snowflake/snowflake-arctic-embed-l-v2.0" does not require this setting. Is it possible to enable loading the model without trusting remote code execution, analougous to the larger model version?
Remote code trust is required because the underlying model implementation (mGTE base) is not a part of the transformers
package the way that the base model for 2.0 L (the XLMR model) is.
In particular, the remote code you are trusting is this file: https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0/blob/main/modeling_hf_alibaba_nlp_gte.py
There are several ways to deal with this trust situation:
- You can pin a specific revision of the model when loading the model. This allows you to lock in a specific version which you have examined yourself.
- You can include a copy of the model implementation in your code (it's Apache 2 licensed) and then load Arctic Embed 2 M using that code instead of trusting remote code (i.e. something like
from my_code.copy_of_gte import GtePreTrainedModel; GtePreTrainedModel.from_pretrained('Snowflake/snowflake-arctic-ended-m-v2.0')
- You can open a PR to try and get the GTE model architecture merged into
transformers
.
Is there a specific reason you find it hard to trust the code in this model's respository the same way you trust the code in the transformers
package? It would be great to understand your needs more specifically.
Thank you for the quick answer and workarounds!
Regarding your quetion: I expect the transformers code to be better audited, due to many more people using transformers.