jinaai
/

jina-embeddings-v2-small-en

@@ -2686,7 +2686,24 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
 </p>
 </details>
-You can use Jina Embedding models directly from transformers package:
 ```python
 !pip install transformers
 from transformers import AutoModel
@@ -2707,6 +2724,28 @@ embeddings = model.encode(
 )
 ```
 ## Alternatives to Using Transformers Package
 1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
@@ -2727,6 +2766,28 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
 2. Multimodal embedding models enable Multimodal RAG applications.
 3. High-performt rerankers.
 ## Contact
 Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

 </p>
 </details>
+You can use Jina Embedding models directly from transformers package.
+First, you need to make sure that you are logged into huggingface. You can either use the huggingface-cli tool (after installing the `transformers` package) and pass your [hugginface access token](https://huggingface.co/docs/hub/security-tokens):
+```bash
+huggingface-cli login
+```
+Alternatively, you can provide the access token as an environment variable in the shell:
+```bash
+export HF_TOKEN="<your token here>"
+```
+or in Python:
+```python
+import os
+os.environ['HF_TOKEN'] = "<your token here>"
+```
+Then, you can use load and use the model via the `AutoModel` class:
 ```python
 !pip install transformers
 from transformers import AutoModel
 )
 ```
+Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):
+```python
+!pip install -U sentence-transformers
+from sentence_transformers import SentenceTransformer
+from sentence_transformers.util import cos_sim
+model = SentenceTransformer(
+    "jinaai/jina-embeddings-v2-small-en", # switch to en/zh for English or Chinese
+    trust_remote_code=True
+)
+# control your input sequence length up to 8192
+model.max_seq_length = 1024
+embeddings = model.encode([
+    'How is the weather today?',
+    'Wie ist das Wetter heute?'
+])
+print(cos_sim(embeddings[0], embeddings[1]))
+```
 ## Alternatives to Using Transformers Package
 1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
 2. Multimodal embedding models enable Multimodal RAG applications.
 3. High-performt rerankers.
+## Trouble Shooting
+**Loading of Model Code failed**
+If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized.
+This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:
+```bash
+Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
+```
+**User is not logged into Huggingface**
+The model is only availabe under [gated access](https://huggingface.co/docs/hub/models-gated).
+This means you need to be logged into huggingface load load it.
+If you receive the following error, you need to provide an access token, either by using the huggingface-cli or providing the token via an environment variable as described above:
+```bash
+OSError: jinaai/jina-embeddings-v2-base-en is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
+If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
+```
 ## Contact
 Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.