sentence_transformers_support

#5
by arthurbresnu HF Staff - opened
No description provided.

Hello!

Preface

I'm a big fan of this sparse models!

Pull Request Overview

Handle the model in the Sentence Transformers library.

Details

The SentenceTransformer library will soon add support for sparse models through the SparseEncoder class.
We would like to add support for this model, and with this PR it is now properly handled.
We modified as little as possible, so it should work with any other custom loading logic you may have.

You will first need to install the current version of the library:

pip install git+https://github.com/arthurbr11/sentence-transformers.git@sparse_implementation

Feel free to run this code using revision="refs/pr/5" in the AutoTokenizer, AutoModelForMaskedLM, etc. to test this PR with your custom code or with the one below before merging:

from sentence_transformers import SparseEncoder

# Download from the πŸ€— Hub
model = SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v1", revision="refs/pr/5")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# (3, 30522)

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

# Let's decode our embeddings to be able to interpret them
decoded = model.decode(embeddings, top_k=10)
for decoded, sentence in zip(decoded, sentences):
    print(f"Sentence: {sentence}")
    print(f"Decoded: {decoded}")
    print()

cc @tomaarsen

Arthur BRESNU

arthurbresnu changed pull request status to open

Feel free to let us know if you have any questions about the files that we're proposing to add here.

For additional context, this is what we'd expect to get as outputs for the similarity and the decoded embeddings:

tensor([[    7.2913,     1.8760,     0.0035],
        [    1.8760,     6.4976,     0.1080],
        [    0.0035,     0.1080,     9.8219]], device='cuda:0')
Sentence: The weather is lovely today.
Decoded: [('weather', 1.405685305595398), ('today', 1.1451733112335205), ('lovely', 0.8350375890731812), ('climate', 0.6556388735771179), ('forecast', 0.5856578946113586), ('beautiful', 0.5536007881164551), ('day', 0.5009242296218872), ('tomorrow', 0.4879005551338196), ('yesterday', 0.481747567653656), ('nice', 0.44232678413391113)]

Sentence: It's so sunny outside!
Decoded: [('outside', 0.9889683723449707), ('sunny', 0.8924372792243958), ('weather', 0.8884875774383545), ('lyrics', 0.6884512901306152), ('so', 0.6462645530700684), ('outdoors', 0.6106253862380981), ('sunshine', 0.5346807241439819), ('outdoor', 0.5262255668640137), ('song', 0.4600299596786499), ('out', 0.4593561589717865)]

Sentence: He drove to the stadium.
Decoded: [('stadium', 0.9480016231536865), ('drive', 0.7638173699378967), ('driving', 0.704725444316864), ('drove', 0.6354855298995972), ('stadiums', 0.5713939070701599), ('driver', 0.5664985775947571), ('football', 0.5491527318954468), ('car', 0.519797682762146), ('baseball', 0.46990716457366943), ('drivers', 0.4272884130477905)]
  • Tom Aarsen
zhichao-geng changed pull request status to merged

Sign up or log in to comment