add infinity example in the readme

```
docker run --gpus all -v $PWD/data:/app/.cache michaelf34/infinity:0.0.69-trt-onnx v2 --model-id dunzhang/stella_en_1.5B_v5 --batch-size 16 --device cuda --engine torch --port 7997

INFO: Started server process [1]
INFO: Waiting for application startup.
INFO 2024-11-14 05:23:56,725 infinity_emb INFO: infinity_server.py:89
Creating 1engines:
engines=['dunzhang/stella_en_1.5B_v5']
INFO 2024-11-14 05:23:56,732 infinity_emb INFO: Anonymized telemetry.py:30
telemetry can be disabled via environment variable
`DO_NOT_TRACK=1`.
INFO 2024-11-14 05:23:56,747 infinity_emb INFO: select_model.py:64
model=`dunzhang/stella_en_1.5B_v5` selected, using
engine=`torch` and device=`cuda`
INFO 2024-11-14 05:23:57,018 SentenceTransformer.py:216
sentence_transformers.SentenceTransformer
INFO: Load pretrained SentenceTransformer:
dunzhang/stella_en_1.5B_v5
A new version of the following files was downloaded from https://huggingface.co/dunzhang/stella_en_1.5B_v5:
- modeling_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.

A new version of the following files was downloaded from https://huggingface.co/dunzhang/stella_en_1.5B_v5:
- tokenization_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
INFO 2024-11-14 05:26:29,978 SentenceTransformer.py:355
sentence_transformers.SentenceTransformer
INFO: 2 prompts are loaded, with the keys:
['s2p_query', 's2s_query']
INFO 2024-11-14 05:26:30,388 infinity_emb INFO: Adding acceleration.py:56
optimizations via Huggingface optimum.
The class `optimum.bettertransformers.transformation.BetterTransformer` is deprecated and will be removed in a future release.
WARNING 2024-11-14 05:26:30,399 infinity_emb WARNING: acceleration.py:67
BetterTransformer is not available for model: <class
'transformers_modules.dunzhang.stella_en_1.5B_v5.221
e30586ab5186c4360cbb7aeb643b6efc9d8f8.modeling_qwen.
Qwen2Model'> Continue without bettertransformer
modeling code.
INFO 2024-11-14 05:26:31,327 infinity_emb INFO: Getting select_model.py:97
timings for batch_size=16 and avg tokens per
sentence=2
3.93 ms tokenization
34.16 ms inference
0.20 ms post-processing
38.29 ms total
embeddings/sec: 417.82
INFO 2024-11-14 05:26:35,569 infinity_emb INFO: Getting select_model.py:103
timings for batch_size=16 and avg tokens per
sentence=512
9.57 ms tokenization
1973.59 ms inference
0.29 ms post-processing
1983.45 ms total
embeddings/sec: 8.07
INFO 2024-11-14 05:26:35,571 infinity_emb INFO: model select_model.py:104
warmed up, between 8.07-417.82 embeddings/sec at
batch_size=16
INFO 2024-11-14 05:26:35,600 infinity_emb INFO: batch_handler.py:386
creating batching engine
INFO 2024-11-14 05:26:35,604 infinity_emb INFO: ready batch_handler.py:453
to batch requests.
INFO 2024-11-14 05:26:35,617 infinity_emb INFO: infinity_server.py:104

♾️ Infinity - Embedding Inference Server
MIT License; Copyright (c) 2023-now Michael Feil
Version 0.0.69

Open the Docs via Swagger UI:
http://0.0.0.0:7997/docs

Access all deployed models via 'GET':
curl http://0.0.0.0:7997/models

Visit the docs for more information:
https://michaelfeil.github.io/infinity

INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)
```

Files changed (1) hide show

README.md +0 -0

README.md CHANGED Viewed

The diff for this file is too large to render. See raw diff