Sentence Transformers 🤗

SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings. It can be used to compute embeddings using Sentence Transformer models or to calculate similarity scores using Cross-Encoder (a.k.a. reranker) models. This unlocks a wide range of applications, including semantic search, semantic textual similarity, and paraphrase mining. Optimum Neuron offer APIs to ease the use of SentenceTransformers on AWS Neuron devices.

Export to Neuron

Option 1: CLI

Example - Text embeddings

optimum-cli export neuron -m BAAI/bge-large-en-v1.5 --sequence_length 384 --batch_size 1 --task feature-extraction bge_emb_neuron/

Example - Image Search

optimum-cli export neuron -m sentence-transformers/clip-ViT-B-32 --sequence_length 64 --text_batch_size 3 --image_batch_size 1 --num_channels 3 --height 224 --width 224 --task feature-extraction --subfolder 0_CLIPModel clip_emb_neuron/

Option 2: Python API

Example - Text embeddings

from optimum.neuron import NeuronModelForSentenceTransformers

# configs for compiling model
input_shapes = {
    "batch_size": 1,
    "sequence_length": 384,
}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}

neuron_model = NeuronModelForSentenceTransformers.from_pretrained(
    "BAAI/bge-large-en-v1.5", 
    export=True, 
    **input_shapes,
    **compiler_args,
)

# Save locally
neuron_model.save_pretrained("bge_emb_neuron/")

# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
    "bge_emb_neuron/", repository_id="optimum/bge-base-en-v1.5-neuronx"  # Replace with your HF Hub repo id
)

Example - Image Search

from optimum.neuron import NeuronModelForSentenceTransformers

# configs for compiling model
input_shapes = {
    "num_channels": 3,
    "height": 224,
    "width": 224,
    "text_batch_size": 3,
    "image_batch_size": 1,
    "sequence_length": 64,
}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}

neuron_model = NeuronModelForSentenceTransformers.from_pretrained(
    "sentence-transformers/clip-ViT-B-32", 
    subfolder="0_CLIPModel", 
    export=True, 
    dynamic_batch_size=False, 
    **input_shapes,
    **compiler_args,
)

# Save locally
neuron_model.save_pretrained("clip_emb_neuron/")

# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
    "clip_emb_neuron/", repository_id="optimum/clip_vit_emb_neuronx"  # Replace with your HF Hub repo id
)

NeuronModelForSentenceTransformers

class optimum.neuron.NeuronModelForSentenceTransformers

< source >

Parameters

config (transformers.PretrainedConfig) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the optimum.neuron.modeling.NeuronTracedModel.from_pretrained method to load the model weights.
model (torch.jit._script.ScriptModule) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.

Neuron Model for Sentence Transformers.

This model inherits from ~neuron.modeling.NeuronTracedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Sentence Transformers model on Neuron devices.

forward

< source >

( input_ids: Tensor attention_mask: Tensor pixel_values: torch.Tensor | None = None token_type_ids: torch.Tensor | None = None **kwargs )

Parameters

input_ids (torch.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode and PreTrainedTokenizer.__call__ for details. What are input IDs?
attention_mask (torch.Tensor | None of shape (batch_size, sequence_length), defaults to None) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
token_type_ids (torch.Tensor | None of shape (batch_size, sequence_length), defaults to None) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?

The NeuronModelForSentenceTransformers forward method, overrides the __call__ special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.

Text Example:

>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForSentenceTransformers

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bge-base-en-v1.5-neuronx")
>>> model = NeuronModelForSentenceTransformers.from_pretrained("optimum/bge-base-en-v1.5-neuronx")

>>> inputs = tokenizer("In the smouldering promise of the fall of Troy, a mythical world of gods and mortals rises from the ashes.", return_tensors="pt")

>>> outputs = model(**inputs)
>>> token_embeddings = outputs.token_embeddings
>>> sentence_embedding = = outputs.sentence_embedding

Image Example:

>>> from PIL import Image
>>> from transformers import AutoProcessor
>>> from sentence_transformers import util
>>> from optimum.neuron import NeuronModelForSentenceTransformers

>>> processor = AutoProcessor.from_pretrained("optimum/clip_vit_emb_neuronx")
>>> model = NeuronModelForSentenceTransformers.from_pretrained("optimum/clip_vit_emb_neuronx")
>>> util.http_get("https://github.com/UKPLab/sentence-transformers/raw/master/examples/sentence_transformer/applications/image-search/two_dogs_in_snow.jpg", "two_dogs_in_snow.jpg")
>>> inputs = processor(
>>>     text=["Two dogs in the snow", 'A cat on a table', 'A picture of London at night'], images=Image.open("two_dogs_in_snow.jpg"), return_tensors="pt", padding=True
>>> )

>>> outputs = model(**inputs)
>>> cos_scores = util.cos_sim(outputs.image_embeds, outputs.text_embeds)  # Compute cosine similarities