alvarobartt HF Staff commited on
Commit
00b4c3f
·
verified ·
1 Parent(s): 97d51ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -32
README.md CHANGED
@@ -110,6 +110,32 @@ for doc, score in doc_score_pairs:
110
  print(score, doc)
111
  ```
112
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  ## Technical Details
114
 
115
  In the following some technical details how this model must be used:
@@ -125,7 +151,6 @@ Note: When loaded with `sentence-transformers`, this model produces normalized e
125
 
126
  ----
127
 
128
-
129
  ## Background
130
 
131
  The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
@@ -142,8 +167,6 @@ Our model is intended to be used for semantic search: It encodes queries / quest
142
 
143
  Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text.
144
 
145
-
146
-
147
  ## Training procedure
148
 
149
  The full training script is accessible in this current repository: `train_script.py`.
@@ -160,8 +183,6 @@ We sampled each dataset given a weighted probability which configuration is deta
160
  The model was trained with [MultipleNegativesRankingLoss](https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) using Mean-pooling, cosine-similarity as similarity function, and a scale of 20.
161
 
162
 
163
-
164
-
165
  | Dataset | Number of training tuples |
166
  |--------------------------------------------------------|:--------------------------:|
167
  | [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs from WikiAnswers | 77,427,422 |
@@ -181,30 +202,4 @@ The model was trained with [MultipleNegativesRankingLoss](https://www.sbert.net/
181
  | [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 |
182
  | [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 |
183
  | [TriviaQA](https://huggingface.co/datasets/trivia_qa) (Question, Evidence) pairs | 73,346 |
184
- | **Total** | **214,988,242** |
185
-
186
- ## Usage (Text Embeddings Inference (TEI))
187
-
188
- [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) is a blazing fast inference solution for text embeddings models.
189
-
190
- - CPU:
191
- ```bash
192
- docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id sentence-transformers/multi-qa-mpnet-base-cos-v1 --pooling mean --dtype float16
193
- ```
194
-
195
- - NVIDIA GPU:
196
- ```bash
197
- docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-latest --model-id sentence-transformers/multi-qa-mpnet-base-cos-v1 --pooling mean --dtype float16
198
- ```
199
-
200
- Send a request to `/v1/embeddings` to generate embeddings via the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create):
201
- ```bash
202
- curl http://localhost:8080/v1/embeddings \
203
- -H "Content-Type: application/json" \
204
- -d '{
205
- "model": "sentence-transformers/multi-qa-mpnet-base-cos-v1",
206
- "input": "How many people live in London?"
207
- }'
208
- ```
209
-
210
- Or check the [Text Embeddings Inference API specification](https://huggingface.github.io/text-embeddings-inference/) instead.
 
110
  print(score, doc)
111
  ```
112
 
113
+ ## Usage (Text Embeddings Inference (TEI))
114
+
115
+ [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) is a blazing fast inference solution for text embeddings models.
116
+
117
+ - CPU:
118
+ ```bash
119
+ docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id sentence-transformers/multi-qa-mpnet-base-cos-v1 --pooling mean --dtype float16
120
+ ```
121
+
122
+ - NVIDIA GPU:
123
+ ```bash
124
+ docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-latest --model-id sentence-transformers/multi-qa-mpnet-base-cos-v1 --pooling mean --dtype float16
125
+ ```
126
+
127
+ Send a request to `/v1/embeddings` to generate embeddings via the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create):
128
+ ```bash
129
+ curl http://localhost:8080/v1/embeddings \
130
+ -H "Content-Type: application/json" \
131
+ -d '{
132
+ "model": "sentence-transformers/multi-qa-mpnet-base-cos-v1",
133
+ "input": "How many people live in London?"
134
+ }'
135
+ ```
136
+
137
+ Or check the [Text Embeddings Inference API specification](https://huggingface.github.io/text-embeddings-inference/) instead.
138
+
139
  ## Technical Details
140
 
141
  In the following some technical details how this model must be used:
 
151
 
152
  ----
153
 
 
154
  ## Background
155
 
156
  The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
 
167
 
168
  Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text.
169
 
 
 
170
  ## Training procedure
171
 
172
  The full training script is accessible in this current repository: `train_script.py`.
 
183
  The model was trained with [MultipleNegativesRankingLoss](https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) using Mean-pooling, cosine-similarity as similarity function, and a scale of 20.
184
 
185
 
 
 
186
  | Dataset | Number of training tuples |
187
  |--------------------------------------------------------|:--------------------------:|
188
  | [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs from WikiAnswers | 77,427,422 |
 
202
  | [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 |
203
  | [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 |
204
  | [TriviaQA](https://huggingface.co/datasets/trivia_qa) (Question, Evidence) pairs | 73,346 |
205
+ | **Total** | **214,988,242** |