Sentence transformer gives loading error

#19
by zhiminy - opened

I tried to run the sentence transformer example but it encountered loading error as follows:

TypeError                                 Traceback (most recent call last)
/home/21zz42/temp/example.ipynb Cell 3 line 1
     10 repos = ['hkunlp/instructor-large', 'intfloat/e5-large']
     12 for repo in repos:
---> 13     model = SentenceTransformer(repo)
     14     model_time = 0
     15     for _ in range(times):

File ~/temp/.venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py:95, in SentenceTransformer.__init__(self, model_name_or_path, modules, device, cache_folder, use_auth_token)
     87         snapshot_download(model_name_or_path,
     88                             cache_dir=cache_folder,
     89                             library_name='sentence-transformers',
     90                             library_version=__version__,
     91                             ignore_files=['flax_model.msgpack', 'rust_model.ot', 'tf_model.h5'],
     92                             use_auth_token=use_auth_token)
     94 if os.path.exists(os.path.join(model_path, 'modules.json')):    #Load as SentenceTransformer model
---> 95     modules = self._load_sbert_model(model_path)
     96 else:   #Load with AutoModel
     97     modules = self._load_auto_model(model_path)

File ~/temp/.venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py:840, in SentenceTransformer._load_sbert_model(self, model_path)
    838 for module_config in modules_config:
    839     module_class = import_from_string(module_config['type'])
--> 840     module = module_class.load(os.path.join(model_path, module_config['path']))
...
    117 with open(os.path.join(input_path, 'config.json')) as fIn:
    118     config = json.load(fIn)
--> 120 return Pooling(**config)

TypeError: Pooling.__init__() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

Here are the original code:

from sentence_transformers import SentenceTransformer
input_texts = [
    'query: how much protein should a female eat',
    'query: summit define',
    "passage: As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
    "passage: Definition of summit for English Language Learners. : 1  the highest point of a mountain : the top of a mountain. : 2  the highest level. : 3  a meeting or series of meetings between the leaders of two or more governments."
]

times = 1
repos = ['hkunlp/instructor-large']

for repo in repos:
    model = SentenceTransformer(repo)
    model_time = 0
    for _ in range(times):
        time_start = time.time()
        embeddings = model.encode(input_texts, normalize_embeddings=True)
        time_taken = time.time() - time_start
        model_time += time_taken
    print(f'{repo} time: {model_time}')

I am using Python 3.10.12 on Linux OS.

NLP Group of The University of Hong Kong org

Hi, what is your sentence-transformer version? FYI, I install the version 2.2.2

Hi, what is your sentence-transformer version? FYI, I install the version 2.2.2

Same, I am using 2.2.2 as well

NLP Group of The University of Hong Kong org

Oh, you should use INSTRUCTOR to load the model:

from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')

For details, you may refer to https://github.com/xlang-ai/instructor-embedding#getting-started.

zhiminy changed discussion status to closed

ers\SentenceTransformer.py", line 194, in init
modules = self._load_sbert_model(
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token'

Can someone help. getting below error.

TypeError: _load_sbert_model() got an unexpected keyword argument 'token'

TypeError Traceback (most recent call last)
in

----> 2 model = INSTRUCTOR('/instructor-large/')
3 device = 'cuda'
4 model.to(device)
5 model.eval()

/local_disk0/.ephemeral_nfs/envs/pythonEnv-8f7450cd-10dc-4b27-9ec2-64ba748b19f9/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py in init(self, model_name_or_path, modules, device, cache_folder, trust_remote_code, revision, token, use_auth_token)
192
193 if is_sentence_transformer_model(model_name_or_path, token, cache_folder=cache_folder, revision=revision):
--> 194 modules = self._load_sbert_model(
195 model_name_or_path,
196 token=token,

TypeError: _load_sbert_model() got an unexpected keyword argument 'token'

To solve this problem, use Sentence Transformer Module separately in your program..

import streamlit as st
from pypdf import PdfReader
from dotenv import load_dotenv
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.huggingface import HuggingFaceInstructEmbeddings
from langchain.vectorstores.faiss import FAISS
import torch

from sentence_transformers import SentenceTransformer # Use SentenceTransformer module to use Hugging face Model

def embedding_store(chunked_text):
# embeddings = OpenAIEmbeddings() # Creating object of class OPenAIEmbeddings

model = SentenceTransformer('hkunlp/instructor-xl')
model_kwargs = {'device': 'cpu'} 
encode_kwargs = {'normalize_embeddings': True}

embeddings = HuggingFaceInstructEmbeddings(model_name=model,model_kwargs=model_kwargs,encode_kwargs=encode_kwargs) 

Hugging_Face1.png
Hugging_Face2.png

vectore_store = FAISS.from_texts(embedding=embeddings,texts=chunked_text)

return vectore_store

Hugging_Face3.png

@utkarshkrc2 I got the following:

ValidationError: 1 validation error for HuggingFaceInstructEmbeddings
model_name
str type expected (type=type_error.str)

Which version of langchain are you using? I tested two versions (0.1.9 and the latest 0.1.11) but got the same problem with both. They don't accept SentenceTransformer as an argument.

@utkarshkrc2 , Even I am getting the same error:

ValidationError: 1 validation error for HuggingFaceInstructEmbeddings model_name str type expected (type=type_error.str)

Below is the code I am using:

def get_vectorstore(text_chunks):
model = SentenceTransformer('hkunlp/instructor-xl') #choosing different models does not make a different, tried with thenlper/gte-base and hkunlp/instructor-large
model_kwargs = {'device': 'cpu'} #changing the device to gpu didn't make any difference
encode_kwargs = {'normalize_embeddings': True}

embeddings = HuggingFaceInstructEmbeddings(model_name=model,model_kwargs=model_kwargs,encode_kwargs=encode_kwargs) 
#embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl") 
vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
return vectorstore

Following is the version of langchain and sentence-transformers I am using.
Name: sentence-transformers
Version: 2.5.1
Name: langchain
Version: 0.1.12

This error comes after it processes the pdf to and finishes generating some of the embeddings, you can see that from the below logs:

modules.json: 100%|████████████████████████████████████████████████████████████████████████████████| 461/461 [00:00<00:00, 647kB/s]
config_sentence_transformers.json: 100%|███████████████████████████████████████████████████████████| 122/122 [00:00<00:00, 338kB/s]
README.md: 100%|██████████████████████████████████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 7.72MB/s]
sentence_bert_config.json: 100%|█████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 206kB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 6.43MB/s]
pytorch_model.bin: 16%|███████████ pytorch_model.bin: 16%|███████████▋ pytorch_model.bin: 17%|████████████▏ pytorch_model.bin: 30%|█████████████████████ pytorch_model.bin: 31%|█████████████████████▋ pytorch_model.bin: 31%|██████████████████████▏ pytorch_model.bin: 32%|██████████████████████▊ pytorch_model.bin: 100%|██████████████████████████████████████████████████████████████████████| 1.34G/1.34G [02:01<00:00, 11.0MB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 5.16MB/s]
spiece.model: 100%|██████████████████████████████████████████████████████████████████████████████| 792k/792k [00:01<00:00, 771kB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 5.06MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 25.0MB/s]
1_Pooling/config.json: 100%|███████████████████████████████████████████████████████████████████████| 270/270 [00:00<00:00, 885kB/s]
2_Dense/config.json: 100%|█████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 459kB/s]
2_Dense/pytorch_model.bin: 100%|██████████████████████████████████████████████████████████████| 3.15M/3.15M [00:00<00:00, 8.11MB/s]
2024-03-15 22:19:04.739 Uncaught app exception
Traceback (most recent call last):

File "/test/LangchainApp.py", line 51, in get_vectorstore
embeddings = HuggingFaceInstructEmbeddings(model_name=model,model_kwargs=model_kwargs,encode_kwargs=encode_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/test/.venv/lib/python3.12/site-packages/langchain_community/embeddings/huggingface.py", line 149, in init
super().init(**kwargs)
File "/test/.venv/lib/python3.12/site-packages/pydantic/v1/main.py", line 341, in init
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for HuggingFaceInstructEmbeddings
model_name
str type expected (type=type_error.str)

Would be really helpful if someone has any idea why I get this error.

Hi, Everyone
Please Use the required Version of Hugging Face Transformer as I used.
Also, For Validation Error Please install python "Setuptools" Package.

Please Ping me, if it is working or not in your case. I also got the same error while executing my program.

Hugging_Face_Transformer_Version.png
Install Setup Tool.png
Working_VectorEmbedding.png

@utkarshkrc2 , thanks for replying. I did try to downgrade the sentence-transformers version to 2.2.2 as I did notice some threads with similar issues and it helped to use this version, however there are other dependencies I am using which didn't work with this.

I will try to downgrade the other dependencies also and test it with that.

Do you know if its a bug with newer version of sentence-transformers?

@utkarshkrc2 , so I fixed the dependencies and got the version of sentence-transformers version to 2.2.2, it seems to have passed through the previous error but now I get this error:

File "/test/LangchainApp.py", line 48, in get_vectorstore
model = SentenceTransformer('hkunlp/instructor-xl')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/test/.venv/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 95, in init
modules = self._load_sbert_model(model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/test/.venv/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 840, in _load_sbert_model
module = module_class.load(os.path.join(model_path, module_config['path']))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/test/.venv/lib/python3.12/site-packages/sentence_transformers/models/Pooling.py", line 120, in load
return Pooling(**config)
^^^^^^^^^^^^^^^^^

Please let me if know if you can help, I can ping you (do let me know how).

@utkarshkrc2 , FYI, the code you showed in your comment does not work. Even with sentence-transformers version 2.2.2 and langchain version of 0.1.2, we get the "TypeError: Pooling.init() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'" error.

If we change the config.json used by Pooling.py file and remove 'pooling_mode_weightedmean_tokens' and 'pooling_mode_lasttoken' as per https://huggingface.co/hkunlp/instructor-base/discussions/6, we are back to the original error of 'ValidationError: 1 validation error for HuggingFaceInstructEmbeddings model_name str type expected (type=type_error.str)'.

Still trying to figure out how to solve this.

Edit: It worked for me after changing the HuggingFaceInstructEmbeddings constructor with only the model name and no other arguments. Also I didn't need to downgrade langchain version, I am still using 0.1.13.

HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")

Hi, I am getting
TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token'
I am using langachain==0.1.2 and sentence-transformers==2.2.2

zhiminy changed discussion status to open

langchain_huggingface
tesnsorboard
tensorflow
setuptools
transformer_sentence 2.2.2
instructor
embeddinginstructor
embeddings
transformer
this are the one need to be installed on my case

any update with new version?

not the transformer_sentence 2.2.2

I already update, it is still exist problem.
InstructorEmbedding==1.0.1
sentence-transformers==3.0.1

Works for me.
Python3.11, pip3.11 installed.
pip3.11 list
|Package| Version|


attrs 24.2.0|
cattrs 24.1.1|
certifi 2024.8.30|
charset-normalizer 3.3.2
click 8.1.7
coremltools 8.0
filelock 3.16.1
fsspec 2024.9.0
huggingface-hub 0.25.0
idna 3.10
InstructorEmbedding 1.0.1
Jinja2 3.1.4
joblib 1.4.2
MarkupSafe 2.1.5
mpmath 1.3.0
networkx 3.3
nltk 3.9.1
numpy 2.1.1
onnx 1.16.2
onnx-coreml 1.3
packaging 24.1
pillow 10.4.0
pip 24.2
protobuf 5.28.1
pyaml 24.7.0
pybind11 2.13.5
PyYAML 6.0.2
regex 2024.9.11
requests 2.32.3
safetensors 0.4.5
scikit-learn 1.5.2
scipy 1.14.1
sentence-transformers 2.2.2
sentencepiece 0.2.0
setuptools 74.1.2
sympy 1.13.3
threadpoolctl 3.5.0
tokenizers 0.19.1
torch 2.4.1
torchvision 0.19.1
tqdm 4.66.5
transformers 4.44.2
typing 3.7.4.3
typing_extensions 4.12.2
urllib3 2.2.3
wheel 0.44.0

I want to use a newer version of sentence-transformers but this issue is still not fixed. :( Anyone has a workaround?

Yep, I had to download the pip library because it was not being kept up to date, and none of the PRs or commits would work for the latest version of sentence-transformers.

To use with the latest version of sentence-transformers (3.3.1), install this modified version:

pip install git+https://github.com/NoahBPeterson/instructor-embedding.git@54076ec450d9825cf84f1ed6e54a5748f6877070

Thank you!!

Sign up or log in to comment