bwang0911 commited on
Commit
a046580
·
1 Parent(s): a59de4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -2620,9 +2620,9 @@ model-index:
2620
 
2621
  ## Intended Usage & Model Info
2622
 
2623
- `jina-embedding-s-en-v2` is an English, monolingual **embedding model** supporting **8192 sequence length**.
2624
  It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
2625
- The backbone `jina-bert-s-en-v2` is pretrained on the C4 dataset.
2626
  The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
2627
  These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
2628
 
@@ -2634,21 +2634,21 @@ Additionally, we provide the following embedding models:
2634
 
2635
  ### V1 (Based on T5, 512 Seq)
2636
 
2637
- - [`jina-embedding-s-en-v1`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
2638
- - [`jina-embedding-b-en-v1`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
2639
- - [`jina-embedding-l-en-v1`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
2640
 
2641
  ### V2 (Based on JinaBert, 8k Seq)
2642
 
2643
- - [`jina-embedding-s-en-v2`](https://huggingface.co/jinaai/jina-embedding-s-en-v2): 33 million parameters **(you are here)**.
2644
- - [`jina-embedding-b-en-v2`](https://huggingface.co/jinaai/jina-embedding-b-en-v2): 137 million parameters.
2645
- - [`jina-embedding-l-en-v2`]: 435 million parameters (releasing soon).
2646
 
2647
  ## Data & Parameters
2648
 
2649
- Jina Embedding V2 technical report coming soon.
2650
 
2651
- Jina Embedding V1 [technical report](https://arxiv.org/abs/2307.11224).
2652
 
2653
  ## Usage
2654
 
@@ -2659,7 +2659,7 @@ from transformers import AutoModel
2659
  from numpy.linalg import norm
2660
 
2661
  cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
2662
- model = AutoModel.from_pretrained('jinaai/jina-embedding-s-en-v2', trust_remote_code=True) # trust_remote_code is needed to use the encode method
2663
  embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
2664
  print(cos_sim(embeddings[0], embeddings[1]))
2665
  ```
 
2620
 
2621
  ## Intended Usage & Model Info
2622
 
2623
+ `jina-embeddings-v2-small-en` is an English, monolingual **embedding model** supporting **8192 sequence length**.
2624
  It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
2625
+ The backbone `jina-bert-v2-small-en` is pretrained on the C4 dataset.
2626
  The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
2627
  These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
2628
 
 
2634
 
2635
  ### V1 (Based on T5, 512 Seq)
2636
 
2637
+ - [`jina-embeddings-v1-small-en`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
2638
+ - [`jina-embeddings-v1-base-en`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
2639
+ - [`jina-embeddings-v2-large-en`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
2640
 
2641
  ### V2 (Based on JinaBert, 8k Seq)
2642
 
2643
+ - [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters **(you are here)**.
2644
+ - [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
2645
+ - [`jina-embeddings-v2-large-en`](): 435 million parameters (releasing soon).
2646
 
2647
  ## Data & Parameters
2648
 
2649
+ Jina Embeddings V2 technical report coming soon.
2650
 
2651
+ Jina Embeddings V1 [technical report](https://arxiv.org/abs/2307.11224).
2652
 
2653
  ## Usage
2654
 
 
2659
  from numpy.linalg import norm
2660
 
2661
  cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
2662
+ model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-small-en', trust_remote_code=True) # trust_remote_code is needed to use the encode method
2663
  embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
2664
  print(cos_sim(embeddings[0], embeddings[1]))
2665
  ```