michaelfeil commited on
Commit
a70a1b2
1 Parent(s): f47e3b5

update readme for instructions for usage with infinity

Browse files

Please merge this PR for documentation update.

Launched on A100-40G, 32GB usage, batch-size=16

```
INFO 2024-11-12 22:13:40,975 infinity_emb INFO: infinity_server.py:89
Creating 1engines:
engines=['Alibaba-NLP/gte-Qwen2-7B-instruct']
INFO 2024-11-12 22:13:40,979 infinity_emb INFO: Anonymized telemetry.py:30
telemetry can be disabled via environment variable
`DO_NOT_TRACK=1`.
INFO 2024-11-12 22:13:40,987 infinity_emb INFO: select_model.py:64
model=`Alibaba-NLP/gte-Qwen2-7B-instruct` selected,
using engine=`torch` and device=`cuda`
INFO 2024-11-12 22:13:41,188 SentenceTransformer.py:216
sentence_transformers.SentenceTransformer
INFO: Load pretrained SentenceTransformer:
Alibaba-NLP/gte-Qwen2-7B-instruct

INFO 2024-11-12 22:41:25,069 SentenceTransformer.py:355
sentence_transformers.SentenceTransformer
INFO: 1 prompts are loaded, with the keys:
['query']
INFO 2024-11-12 22:41:26,143 infinity_emb INFO: Getting select_model.py:97
timings for batch_size=16 and avg tokens per
sentence=2
2.64 ms tokenization
32.47 ms inference
0.25 ms post-processing
35.36 ms total
embeddings/sec: 452.54
INFO 2024-11-12 22:41:27,721 infinity_emb INFO: Getting select_model.py:103
timings for batch_size=16 and avg tokens per
sentence=513
7.76 ms tokenization
765.84 ms inference
0.53 ms post-processing
774.13 ms total
embeddings/sec: 20.67
```

Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -5622,6 +5622,18 @@ scores = (embeddings[:2] @ embeddings[2:].T) * 100
5622
  print(scores.tolist())
5623
  ```
5624
 
 
 
 
 
 
 
 
 
 
 
 
 
5625
  ## Evaluation
5626
 
5627
  ### MTEB & C-MTEB
 
5622
  print(scores.tolist())
5623
  ```
5624
 
5625
+ ## Infinity_emb
5626
+
5627
+ Usage via [infinity](https://github.com/michaelfeil/infinity), a MIT Licensed inference server.
5628
+
5629
+ ```
5630
+ # requires ~16-32GB VRAM NVIDIA Compute Capability >= 8.0
5631
+ docker run \
5632
+ -v $PWD/data:/app/.cache --gpus "0" -p "7997":"7997" \
5633
+ michaelf34/infinity:0.0.68-trt-onnx \
5634
+ v2 --model-id Alibaba-NLP/gte-Qwen2-7B-instruct --revision "refs/pr/38" --dtype bfloat16 --batch-size 8 --device cuda --engine torch --port 7997 --no-bettertransformer
5635
+ ```
5636
+
5637
  ## Evaluation
5638
 
5639
  ### MTEB & C-MTEB