Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -98,9 +98,10 @@ lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-int4wo-hqq --tas
 Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch size 1 performance with vllm.
 For batch size N, please see our [gemlite checkpoint](https://huggingface.co/jerryzh168/phi4-mini-int4wo-gemlite).
-# Install latest vllm to get the most recent changes
 ```
-pip install git+https://github.com/vllm-project/vllm.git
 ```
 # Download dataset
@@ -108,6 +109,9 @@ Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/
 Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
 # benchmark_latency
 ## baseline
 ```
 python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
@@ -122,6 +126,9 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
 We also benchmarked the throughput in a serving environment.
 ## baseline
 Server:
 ```

 Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch size 1 performance with vllm.
 For batch size N, please see our [gemlite checkpoint](https://huggingface.co/jerryzh168/phi4-mini-int4wo-gemlite).
+# Download vllm source code and install vllm
 ```
+git clone git@github.com:vllm-project/vllm.git
+VLLM_USE_PRECOMPILED=1 pip install .
 ```
 # Download dataset
 Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
 # benchmark_latency
+Run the following under vllm source code root folder:
 ## baseline
 ```
 python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
 We also benchmarked the throughput in a serving environment.
+Run the following under `vllm` source code root folder:
 ## baseline
 Server:
 ```