mixedbread-ai
/

mxbai-embed-large-v1

@@ -2617,7 +2617,7 @@ pipeline_tag: feature-extraction
 # mxbai-embed-large-v1
-This is our base sentence embedding model. It was trained using [AnglE](https://arxiv.org/abs/2309.12871) loss on our high-quality large scale data. It achieves SOTA performance on BERT-large scale. Find out more in our [blog post](https://mixedbread.ai/blog/mxbai-embed-large-v1).
 ## Quickstart
@@ -2631,10 +2631,13 @@ python -m pip install -U sentence-transformers
 ```python
 from sentence_transformers import SentenceTransformer
-from sentence_transformers.util import cos_sim
-# 1. load model
-model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
 # For retrieval you need to pass this prompt.
 query = 'Represent this sentence for searching relevant passages: A man is eating a piece of bread'
@@ -2650,8 +2653,13 @@ docs = [
 # 2. Encode
 embeddings = model.encode(docs)
 similarities = cos_sim(embeddings[0], embeddings[1:])
 print('similarities:', similarities)
 ```
 ### Transformers
@@ -2669,7 +2677,7 @@ def transform_query(query: str) -> str:
     """
     return f'Represent this sentence for searching relevant passages: {query}'
-# The model works really well with cls pooling (default) but also with mean poolin.
 def pooling(outputs: torch.Tensor, inputs: Dict,  strategy: str = 'cls') -> np.ndarray:
     if strategy == 'cls':
         outputs = outputs[:, 0]
@@ -2743,7 +2751,7 @@ console.log(similarities); // [0.7919578577247139, 0.6369278664248345, 0.1651201
 You can use the model via our API as follows:
 ```python
-from mixedbread_ai.client import MixedbreadAI
 from sklearn.metrics.pairwise import cosine_similarity
 import os
@@ -2756,15 +2764,17 @@ english_sentences = [
 res = mxbai.embeddings(
      input=english_sentences,
-     model="mixedbread-ai/mxbai-embed-large-v1"
 )
-embeddings = [entry.embedding for entry in res.data]
-similarities = cosine_similarity([embeddings[0]], [embeddings[1]])
-print(similarities)
 ```
-The API comes with native INT8 and binary quantization support! Check out the [docs](https://mixedbread.ai/docs) for more information.
 ## Evaluation
 As of March 2024, our model archives SOTA performance for Bert-large sized models on the [MTEB](https://huggingface.co/spaces/mteb/leaderboard). It ourperforms commercial models like OpenAIs text-embedding-3-large and matches the performance of model 20x it's size like the [echo-mistral-7b](https://huggingface.co/jspringer/echo-mistral-7b-instruct-lasttoken). Our model was trained with no overlap of the MTEB data, which indicates that our model generalizes well across several domains, tasks and text length. We know there are some limitations with this model, which will be fixed in v2.
@@ -2785,6 +2795,14 @@ As of March 2024, our model archives SOTA performance for Bert-large sized model
 Please find more information in our [blog post](https://mixedbread.ai/blog/mxbai-embed-large-v1).
 ## Community
 Please join our [Discord Community](https://discord.gg/jDfMHzAVfU) and share your feedback and thoughts! We are here to help and also always happy to chat.

 # mxbai-embed-large-v1
+Here, we provide several ways to produce sentence embeddings. Please note that you have to provide the prompt `Represent this sentence for searching relevant passages:` for query if you want to use it for retrieval. Besides that you don't need any prompt. Our model also supports Matryoshka Representation Learning and binary or int8 quantization. [Learn More](https://www.mixedbread.ai/blog/binary-mrl)
 ## Quickstart
 ```python
 from sentence_transformers import SentenceTransformer
+from sentence_transformers.util import cos_sim, quantize_embeddings
+# 1. Specify preffered dimensions (default is 1024)
+dimensions = 512
+# 2. load model
+model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1", truncate_dim=dimensions)
 # For retrieval you need to pass this prompt.
 query = 'Represent this sentence for searching relevant passages: A man is eating a piece of bread'
 # 2. Encode
 embeddings = model.encode(docs)
+# Optional: Quantize the embeddings
+binary_embeddings = quantize_embeddings(embeddings, precision="ubinary")
 similarities = cos_sim(embeddings[0], embeddings[1:])
 print('similarities:', similarities)
 ```
 ### Transformers
     """
     return f'Represent this sentence for searching relevant passages: {query}'
+# The model works really well with cls pooling (default) but also with mean pooling.
 def pooling(outputs: torch.Tensor, inputs: Dict,  strategy: str = 'cls') -> np.ndarray:
     if strategy == 'cls':
         outputs = outputs[:, 0]
 You can use the model via our API as follows:
 ```python
+from mixedbread_ai.client import MixedbreadAI, EncodingFormat
 from sklearn.metrics.pairwise import cosine_similarity
 import os
 res = mxbai.embeddings(
      input=english_sentences,
+     model="mixedbread-ai/mxbai-embed-large-v1",
+     normalized=True,
+     encoding_format=[EncodingFormat.FLOAT, EncodingFormat.UBINARY, EncodingFormat.INT_8],
+     dimensions=512
 )
+encoded_embeddings = res.data[0].embedding
+print(res.dimensions, encoded_embeddings.ubinary, encoded_embeddings.float_, encoded_embeddings.int_8)
 ```
+The API comes with native int8 and binary quantization support! Check out the [docs](https://mixedbread.ai/docs) for more information.
 ## Evaluation
 As of March 2024, our model archives SOTA performance for Bert-large sized models on the [MTEB](https://huggingface.co/spaces/mteb/leaderboard). It ourperforms commercial models like OpenAIs text-embedding-3-large and matches the performance of model 20x it's size like the [echo-mistral-7b](https://huggingface.co/jspringer/echo-mistral-7b-instruct-lasttoken). Our model was trained with no overlap of the MTEB data, which indicates that our model generalizes well across several domains, tasks and text length. We know there are some limitations with this model, which will be fixed in v2.
 Please find more information in our [blog post](https://mixedbread.ai/blog/mxbai-embed-large-v1).
+## Matryoshka and Binary Quantization
+Embeddings in their commonly used form (float arrays) have a high memory footprint when used at scale. Two approaches to solve this problem are Matryoshka Representation Learning (MRL) and (Binary) Quantization.
+While MRL reduces the number of dimensions of an embedding, binary quantization transforms the value of each dimension from a float32 into a lower precision (int8 or even binary). <b> The model supports both approaches! </b>
+You can also take it one step further, and combine these. This combination of binary quantization and MRL allows you to reduce the memory usage of your embeddings significantly. This leads to much lower costs when using a vector database in particular. You can read more about the technology and its advantages in our [blog post](https://www.mixedbread.ai/blog/binary-mrl).
 ## Community
 Please join our [Discord Community](https://discord.gg/jDfMHzAVfU) and share your feedback and thoughts! We are here to help and also always happy to chat.