add infinity example in the readme

#32
docker run --gpus all -v $PWD/data:/app/.cache michaelf34/infinity:0.0.69-trt-onnx v2 --model-id dunzhang/stella_en_1.5B_v5 --batch-size 16 --device cuda --engine torch --port 7997 

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2024-11-14 05:23:56,725 infinity_emb INFO:        infinity_server.py:89
         Creating 1engines:                                                     
         engines=['dunzhang/stella_en_1.5B_v5']                                 
INFO     2024-11-14 05:23:56,732 infinity_emb INFO: Anonymized   telemetry.py:30
         telemetry can be disabled via environment variable                     
         `DO_NOT_TRACK=1`.                                                      
INFO     2024-11-14 05:23:56,747 infinity_emb INFO:           select_model.py:64
         model=`dunzhang/stella_en_1.5B_v5` selected, using                     
         engine=`torch` and device=`cuda`                                       
INFO     2024-11-14 05:23:57,018                      SentenceTransformer.py:216
         sentence_transformers.SentenceTransformer                              
         INFO: Load pretrained SentenceTransformer:                             
         dunzhang/stella_en_1.5B_v5                                             
A new version of the following files was downloaded from https://huggingface.co/dunzhang/stella_en_1.5B_v5:
- modeling_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


A new version of the following files was downloaded from https://huggingface.co/dunzhang/stella_en_1.5B_v5:
- tokenization_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
INFO     2024-11-14 05:26:29,978                      SentenceTransformer.py:355
         sentence_transformers.SentenceTransformer                              
         INFO: 2 prompts are loaded, with the keys:                             
         ['s2p_query', 's2s_query']                                             
INFO     2024-11-14 05:26:30,388 infinity_emb INFO: Adding    acceleration.py:56
         optimizations via Huggingface optimum.                                 
The class `optimum.bettertransformers.transformation.BetterTransformer` is deprecated and will be removed in a future release.
WARNING  2024-11-14 05:26:30,399 infinity_emb WARNING:        acceleration.py:67
         BetterTransformer is not available for model: <class                   
         'transformers_modules.dunzhang.stella_en_1.5B_v5.221                   
         e30586ab5186c4360cbb7aeb643b6efc9d8f8.modeling_qwen.                   
         Qwen2Model'> Continue without bettertransformer                        
         modeling code.                                                         
INFO     2024-11-14 05:26:31,327 infinity_emb INFO: Getting   select_model.py:97
         timings for batch_size=16 and avg tokens per                           
         sentence=2                                                             
                 3.93     ms tokenization                                       
                 34.16    ms inference                                          
                 0.20     ms post-processing                                    
                 38.29    ms total                                              
         embeddings/sec: 417.82                                                 
INFO     2024-11-14 05:26:35,569 infinity_emb INFO: Getting  select_model.py:103
         timings for batch_size=16 and avg tokens per                           
         sentence=512                                                           
                 9.57     ms tokenization                                       
                 1973.59          ms inference                                  
                 0.29     ms post-processing                                    
                 1983.45          ms total                                      
         embeddings/sec: 8.07                                                   
INFO     2024-11-14 05:26:35,571 infinity_emb INFO: model    select_model.py:104
         warmed up, between 8.07-417.82 embeddings/sec at                       
         batch_size=16                                                          
INFO     2024-11-14 05:26:35,600 infinity_emb INFO:         batch_handler.py:386
         creating batching engine                                               
INFO     2024-11-14 05:26:35,604 infinity_emb INFO: ready   batch_handler.py:453
         to batch requests.                                                     
INFO     2024-11-14 05:26:35,617 infinity_emb INFO:       infinity_server.py:104
                                                                                
         ♾️  Infinity - Embedding Inference Server                               
         MIT License; Copyright (c) 2023-now Michael Feil                       
         Version 0.0.69                                                         
                                                                                
         Open the Docs via Swagger UI:                                          
         http://0.0.0.0:7997/docs                                               
                                                                                
         Access all deployed models via 'GET':                                  
         curl http://0.0.0.0:7997/models                                        
                                                                                
         Visit the docs for more information:                                   
         https://michaelfeil.github.io/infinity                                 
                                                                                
                                                                                
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)
This comment has been hidden

@infgrad Can you review?

This model supports multiple dimensions. How do I choose a dimension with infinity? @michaelfeil

Cannot merge
This branch has merge conflicts in the following files:
  • README.md

Sign up or log in to comment