Update README.md
Browse files
README.md
CHANGED
@@ -15,9 +15,23 @@ license: apache-2.0
|
|
15 |
|
16 |
The text embedding suit trained by [Jina AI](https://github.com/jina-ai), [Finetuner team](https://github.com/jina-ai/finetuner).
|
17 |
|
18 |
-
## Intented Usage
|
19 |
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
## Data & Parameters
|
23 |
|
|
|
15 |
|
16 |
The text embedding suit trained by [Jina AI](https://github.com/jina-ai), [Finetuner team](https://github.com/jina-ai/finetuner).
|
17 |
|
18 |
+
## Intented Usage & Model Info
|
19 |
|
20 |
+
`jina-embedding-s-en-v1` is a language model that has been trained using Jina AI's Linnaeus-Clean dataset.
|
21 |
+
This dataset consists of 380 million pairs of sentences, which include both query-document pairs.
|
22 |
+
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|
23 |
+
The Linnaeus-Full dataset, from which the Linnaeus-Clean dataset is derived, originally contained 1.6 billion sentence pairs.
|
24 |
+
|
25 |
+
The model has a range of use cases, including information retrieval, semantic textual similarity, text reranking, and more.
|
26 |
+
|
27 |
+
With a compact size of just 35 million parameters,
|
28 |
+
the model enables lightning-fast inference while still delivering impressive performance.
|
29 |
+
Additionally, we provide the following options:
|
30 |
+
|
31 |
+
- jina-embedding-b-en-v1: 110 million parameters.
|
32 |
+
- jina-embedding-l-en-v1: 800 million parameters.
|
33 |
+
- jina-embedding-xl-en-v1: 3 billion parameters.
|
34 |
+
- jina-embedding-xxl-en-v1: 11 billion parameters.
|
35 |
|
36 |
## Data & Parameters
|
37 |
|