AWS Trainium & Inferentia documentation

Sentence Transformers 🤗

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Sentence Transformers 🤗

SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings. It can be used to compute embeddings using Sentence Transformer models or to calculate similarity scores using Cross-Encoder (a.k.a. reranker) models. This unlocks a wide range of applications, including semantic search, semantic textual similarity, and paraphrase mining. Optimum Neuron offer APIs to ease the use of SentenceTransformers on AWS Neuron devices.

Export to Neuron

Option 1: CLI

  • Example - Text embeddings
optimum-cli export neuron -m BAAI/bge-large-en-v1.5 --sequence_length 384 --batch_size 1 --task feature-extraction bge_emb_neuron/
  • Example - Image Search
optimum-cli export neuron -m sentence-transformers/clip-ViT-B-32 --sequence_length 64 --text_batch_size 3 --image_batch_size 1 --num_channels 3 --height 224 --width 224 --task feature-extraction --subfolder 0_CLIPModel clip_emb_neuron/

Option 2: Python API

  • Example - Text embeddings
from optimum.neuron import NeuronModelForSentenceTransformers

# configs for compiling model
input_shapes = {
    "batch_size": 1,
    "sequence_length": 384,
}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}

neuron_model = NeuronModelForSentenceTransformers.from_pretrained(
    "BAAI/bge-large-en-v1.5", 
    export=True, 
    **input_shapes,
    **compiler_args,
)

# Save locally
neuron_model.save_pretrained("bge_emb_neuron/")

# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
    "bge_emb_neuron/", repository_id="optimum/bge-base-en-v1.5-neuronx"  # Replace with your HF Hub repo id
)
  • Example - Image Search
from optimum.neuron import NeuronModelForSentenceTransformers

# configs for compiling model
input_shapes = {
    "num_channels": 3,
    "height": 224,
    "width": 224,
    "text_batch_size": 3,
    "image_batch_size": 1,
    "sequence_length": 64,
}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}

neuron_model = NeuronModelForSentenceTransformers.from_pretrained(
    "sentence-transformers/clip-ViT-B-32", 
    subfolder="0_CLIPModel", 
    export=True, 
    dynamic_batch_size=False, 
    **input_shapes,
    **compiler_args,
)

# Save locally
neuron_model.save_pretrained("clip_emb_neuron/")

# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
    "clip_emb_neuron/", repository_id="optimum/clip_vit_emb_neuronx"  # Replace with your HF Hub repo id
)

NeuronModelForSentenceTransformers

class optimum.neuron.NeuronModelForSentenceTransformers

< >

( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )

Parameters

  • config (transformers.PretrainedConfig) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the optimum.neuron.modeling.NeuronTracedModel.from_pretrained method to load the model weights.
  • model (torch.jit._script.ScriptModule) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.

Neuron Model for Sentence Transformers.

This model inherits from ~neuron.modeling.NeuronTracedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Sentence Transformers model on Neuron devices.

forward

< >

( input_ids: Tensor attention_mask: Tensor pixel_values: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )

Parameters

  • input_ids (torch.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode and PreTrainedTokenizer.__call__ for details. What are input IDs?
  • attention_mask (Union[torch.Tensor, None] of shape (batch_size, sequence_length), defaults to None) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
  • token_type_ids (Union[torch.Tensor, None] of shape (batch_size, sequence_length), defaults to None) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

The NeuronModelForSentenceTransformers forward method, overrides the __call__ special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.

Text Example:

>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForSentenceTransformers

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bge-base-en-v1.5-neuronx")
>>> model = NeuronModelForSentenceTransformers.from_pretrained("optimum/bge-base-en-v1.5-neuronx")

>>> inputs = tokenizer("In the smouldering promise of the fall of Troy, a mythical world of gods and mortals rises from the ashes.", return_tensors="pt")

>>> outputs = model(**inputs)
>>> token_embeddings = outputs.token_embeddings
>>> sentence_embedding = = outputs.sentence_embedding

Image Example:

>>> from PIL import Image
>>> from transformers import AutoProcessor
>>> from sentence_transformers import util
>>> from optimum.neuron import NeuronModelForSentenceTransformers

>>> processor = AutoProcessor.from_pretrained("optimum/clip_vit_emb_neuronx")
>>> model = NeuronModelForSentenceTransformers.from_pretrained("optimum/clip_vit_emb_neuronx")
>>> util.http_get("https://github.com/UKPLab/sentence-transformers/raw/master/examples/sentence_transformer/applications/image-search/two_dogs_in_snow.jpg", "two_dogs_in_snow.jpg")
>>> inputs = processor(
>>>     text=["Two dogs in the snow", 'A cat on a table', 'A picture of London at night'], images=Image.open("two_dogs_in_snow.jpg"), return_tensors="pt", padding=True
>>> )

>>> outputs = model(**inputs)
>>> cos_scores = util.cos_sim(outputs.image_embeds, outputs.text_embeds)  # Compute cosine similarities