Some weights of the model checkpoint at ./model_dir were not used when initializing BertModel

#30

by ericxian1997 - opened Nov 22, 2023

ericxian1997

Nov 22, 2023

Some weights of the model checkpoint at ./model_dir were not used when initializing BertModel: ['encoder.layer.2.mlp.wo.bias', 'encoder.layer.11.mlp.wo.weight', 'encoder.layer.0.mlp.layernorm.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.7.mlp.wo.weight', 'encoder.layer.8.mlp.wo.weight', 'encoder.layer.3.mlp.wo.weight', 'encoder.layer.1.mlp.layernorm.bias', 'encoder.layer.8.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.7.mlp.wo.bias', 'encoder.layer.9.mlp.layernorm.bias', 'encoder.layer.10.mlp.wo.weight', 'encoder.layer.11.mlp.layernorm.weight', 'encoder.layer.0.mlp.wo.weight', 'encoder.layer.8.mlp.wo.bias', 'encoder.layer.7.mlp.gated_layers.weight', 'encoder.layer.0.mlp.layernorm.bias', 'encoder.layer.11.mlp.gated_layers.weight', 'encoder.layer.3.mlp.wo.bias', 'encoder.layer.4.mlp.gated_layers.weight', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.9.mlp.wo.bias', 'encoder.layer.5.mlp.layernorm.weight', 'encoder.layer.10.mlp.layernorm.weight', 'encoder.layer.6.mlp.layernorm.bias', 'encoder.layer.2.mlp.gated_layers.weight', 'encoder.layer.4.mlp.layernorm.weight', 'encoder.layer.6.mlp.wo.bias', 'encoder.layer.7.mlp.layernorm.bias', 'encoder.layer.10.mlp.layernorm.bias', 'encoder.layer.0.mlp.gated_layers.weight', 'encoder.layer.4.mlp.wo.bias', 'encoder.layer.6.mlp.layernorm.weight', 'encoder.layer.2.mlp.wo.weight', 'encoder.layer.3.mlp.gated_layers.weight', 'encoder.layer.9.mlp.wo.weight', 'encoder.layer.7.mlp.layernorm.weight', 'encoder.layer.0.mlp.wo.bias', 'encoder.layer.10.mlp.gated_layers.weight', 'encoder.layer.4.mlp.layernorm.bias', 'encoder.layer.11.mlp.wo.bias', 'encoder.layer.8.mlp.layernorm.bias', 'encoder.layer.3.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.layernorm.bias', 'encoder.layer.4.mlp.wo.weight', 'encoder.layer.1.mlp.wo.bias', 'encoder.layer.1.mlp.layernorm.weight', 'encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.8.mlp.gated_layers.weight', 'encoder.layer.5.mlp.wo.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.9.mlp.layernorm.weight', 'encoder.layer.11.mlp.layernorm.bias', 'encoder.layer.1.mlp.wo.weight', 'encoder.layer.6.mlp.wo.weight', 'encoder.layer.9.mlp.gated_layers.weight', 'encoder.layer.6.mlp.gated_layers.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertModel were not initialized from the model checkpoint at ./model_dir and are newly initialized: ['encoder.layer.7.intermediate.dense.weight', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.11.output.dense.weight', 'encoder.layer.8.output.dense.weight', 'encoder.layer.9.output.LayerNorm.weight', 'encoder.layer.9.intermediate.dense.bias', 'encoder.layer.3.intermediate.dense.weight', 'encoder.layer.8.intermediate.dense.weight', 'encoder.layer.6.output.dense.bias', 'encoder.layer.1.output.dense.bias', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.8.intermediate.dense.bias', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.5.intermediate.dense.bias', 'encoder.layer.5.output.dense.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.4.intermediate.dense.bias', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.7.output.LayerNorm.weight', 'encoder.layer.11.output.dense.bias', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.7.output.dense.weight', 'encoder.layer.8.output.LayerNorm.weight', 'encoder.layer.8.output.LayerNorm.bias', 'encoder.layer.8.output.dense.bias', 'encoder.layer.11.output.LayerNorm.bias', 'encoder.layer.3.output.dense.bias', 'encoder.layer.9.output.LayerNorm.bias', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.11.output.LayerNorm.weight', 'encoder.layer.4.output.dense.bias', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.9.output.dense.weight', 'encoder.layer.6.intermediate.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.3.output.LayerNorm.bias', 'encoder.layer.2.output.dense.bias', 'encoder.layer.4.intermediate.dense.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.4.output.dense.weight', 'encoder.layer.5.output.dense.weight', 'embeddings.position_embeddings.weight', 'encoder.layer.5.output.LayerNorm.weight', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.3.output.LayerNorm.weight', 'encoder.layer.6.output.LayerNorm.weight', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.11.intermediate.dense.weight', 'encoder.layer.10.output.dense.weight', 'encoder.layer.4.output.LayerNorm.weight', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.0.output.dense.weight', 'encoder.layer.5.output.LayerNorm.bias', 'encoder.layer.9.intermediate.dense.weight', 'encoder.layer.3.intermediate.dense.bias', 'encoder.layer.5.intermediate.dense.weight', 'encoder.layer.4.output.LayerNorm.bias', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.7.output.dense.bias', 'encoder.layer.9.output.dense.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.2.output.dense.weight', 'encoder.layer.6.output.dense.weight', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.6.intermediate.dense.weight', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.10.output.dense.bias', 'encoder.layer.11.intermediate.dense.bias', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.7.output.LayerNorm.bias', 'encoder.layer.3.output.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

A-Issa-1999

Nov 23, 2023

where u able to solve it ?

michael-guenther

Jina AI org Nov 23, 2023

•

edited Nov 23, 2023

This usually happens if trust_remote_code=True is missing when calling AutoModel.from_pretrained. If this does not solve your problem, can you share the code and the version of the transformers package, which you were using to load the model?

bwang0911 changed discussion status to closed Dec 6, 2023

YalunHu

Oct 31, 2024

This usually happens if trust_remote_code=True is missing when calling AutoModel.from_pretrained. If this does not solve your problem, can you share the code and the version of the transformers package, which you were using to load the model?

May I ask how to solve it if I want to use the model in a machine without internet connection, when set trust_remote_code=True, if fails to load model

alicedb

Nov 5, 2024

•

edited Nov 5, 2024

To use the model offline you must first download the model into your huggingface cache folder (usually ~/.cache/huggingface/hub). This is done automatically the first time you call

# with transformers
embeddings = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)
# with sentence_transformers
embeddings = SentenceTransformer('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)
# with LangChain (using sentence_transformers in the background)
embeddings = HuggingFaceEmbeddings(model_name='jinaai/jina-embeddings-v2-base-en', model_kwargs={'trust_remote_code': True})

All subsequent times you call the above it will load the model from the cache and can therefore be used offline.

Note that embedding vectors obtained from AutoModel.from_pretrained are not normalized by default, so you might notice a discrepancy with the latter two.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment