jerryzh168 commited on
Commit
d906e2b
·
verified ·
1 Parent(s): 864b9be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -284,16 +284,17 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
284
  Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
285
 
286
  ## Setup
287
- Need to install vllm nightly to get some recent changes
288
- ```Shell
289
- pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
290
- ```
291
 
292
  Get vllm source code:
293
  ```Shell
294
  git clone [email protected]:vllm-project/vllm.git
295
  ```
296
 
 
 
 
 
 
297
  Run the benchmarks under `vllm` root folder:
298
 
299
  ## benchmark_latency
 
284
  Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
285
 
286
  ## Setup
 
 
 
 
287
 
288
  Get vllm source code:
289
  ```Shell
290
  git clone [email protected]:vllm-project/vllm.git
291
  ```
292
 
293
+ Install vllm
294
+ ```
295
+ VLLM_USE_PRECOMPILED=1 pip install --editable .
296
+ ```
297
+
298
  Run the benchmarks under `vllm` root folder:
299
 
300
  ## benchmark_latency