Update README.md
Browse files
README.md
CHANGED
@@ -98,9 +98,10 @@ lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-int4wo-hqq --tas
|
|
98 |
Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch size 1 performance with vllm.
|
99 |
For batch size N, please see our [gemlite checkpoint](https://huggingface.co/jerryzh168/phi4-mini-int4wo-gemlite).
|
100 |
|
101 |
-
#
|
102 |
```
|
103 |
-
|
|
|
104 |
```
|
105 |
|
106 |
# Download dataset
|
@@ -108,6 +109,9 @@ Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/
|
|
108 |
|
109 |
Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
|
110 |
# benchmark_latency
|
|
|
|
|
|
|
111 |
## baseline
|
112 |
```
|
113 |
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
|
@@ -122,6 +126,9 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
|
122 |
|
123 |
We also benchmarked the throughput in a serving environment.
|
124 |
|
|
|
|
|
|
|
125 |
## baseline
|
126 |
Server:
|
127 |
```
|
|
|
98 |
Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch size 1 performance with vllm.
|
99 |
For batch size N, please see our [gemlite checkpoint](https://huggingface.co/jerryzh168/phi4-mini-int4wo-gemlite).
|
100 |
|
101 |
+
# Download vllm source code and install vllm
|
102 |
```
|
103 |
+
git clone git@github.com:vllm-project/vllm.git
|
104 |
+
VLLM_USE_PRECOMPILED=1 pip install .
|
105 |
```
|
106 |
|
107 |
# Download dataset
|
|
|
109 |
|
110 |
Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
|
111 |
# benchmark_latency
|
112 |
+
|
113 |
+
Run the following under vllm source code root folder:
|
114 |
+
|
115 |
## baseline
|
116 |
```
|
117 |
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
|
|
|
126 |
|
127 |
We also benchmarked the throughput in a serving environment.
|
128 |
|
129 |
+
|
130 |
+
Run the following under `vllm` source code root folder:
|
131 |
+
|
132 |
## baseline
|
133 |
Server:
|
134 |
```
|