sentence_transformers_support
Hello!
Preface
I'm a big fan of this sparse models!
Pull Request Overview
Handle the model in the Sentence Transformers library.
Details
The SentenceTransformer
library will soon add support for sparse models through the SparseEncoder
class.
We would like to add support for this model, and with this PR it is now properly handled.
We modified as little as possible, so it should work with any other custom loading logic you may have.
You will first need to install the current version of the library:
pip install git+https://github.com/arthurbr11/sentence-transformers.git@sparse_implementation
Feel free to run this code using revision="refs/pr/5" in the AutoTokenizer, AutoModelForMaskedLM, etc. to test this PR with your custom code or with the one below before merging:
from sentence_transformers import SparseEncoder
# Download from the π€ Hub
model = SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v1", revision="refs/pr/5")
# Run inference
sentences = [
'The weather is lovely today.',
"It's so sunny outside!",
'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# (3, 30522)
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
# Let's decode our embeddings to be able to interpret them
decoded = model.decode(embeddings, top_k=10)
for decoded, sentence in zip(decoded, sentences):
print(f"Sentence: {sentence}")
print(f"Decoded: {decoded}")
print()
cc @tomaarsen
Arthur BRESNU
Feel free to let us know if you have any questions about the files that we're proposing to add here.
For additional context, this is what we'd expect to get as outputs for the similarity and the decoded embeddings:
tensor([[ 7.2913, 1.8760, 0.0035],
[ 1.8760, 6.4976, 0.1080],
[ 0.0035, 0.1080, 9.8219]], device='cuda:0')
Sentence: The weather is lovely today.
Decoded: [('weather', 1.405685305595398), ('today', 1.1451733112335205), ('lovely', 0.8350375890731812), ('climate', 0.6556388735771179), ('forecast', 0.5856578946113586), ('beautiful', 0.5536007881164551), ('day', 0.5009242296218872), ('tomorrow', 0.4879005551338196), ('yesterday', 0.481747567653656), ('nice', 0.44232678413391113)]
Sentence: It's so sunny outside!
Decoded: [('outside', 0.9889683723449707), ('sunny', 0.8924372792243958), ('weather', 0.8884875774383545), ('lyrics', 0.6884512901306152), ('so', 0.6462645530700684), ('outdoors', 0.6106253862380981), ('sunshine', 0.5346807241439819), ('outdoor', 0.5262255668640137), ('song', 0.4600299596786499), ('out', 0.4593561589717865)]
Sentence: He drove to the stadium.
Decoded: [('stadium', 0.9480016231536865), ('drive', 0.7638173699378967), ('driving', 0.704725444316864), ('drove', 0.6354855298995972), ('stadiums', 0.5713939070701599), ('driver', 0.5664985775947571), ('football', 0.5491527318954468), ('car', 0.519797682762146), ('baseball', 0.46990716457366943), ('drivers', 0.4272884130477905)]
- Tom Aarsen