Update README.md
Browse files
README.md
CHANGED
@@ -109,7 +109,7 @@ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks
|
|
109 |
|
110 |
## int4wo-hqq
|
111 |
```
|
112 |
-
lm_eval --model hf --model_args pretrained=
|
113 |
```
|
114 |
|
115 |
`TODO: more complete eval results`
|
@@ -162,7 +162,7 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
|
162 |
|
163 |
### int4wo-hqq
|
164 |
```
|
165 |
-
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
166 |
```
|
167 |
|
168 |
## benchmark_serving
|
@@ -186,16 +186,16 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
|
|
186 |
### int4wo-hqq
|
187 |
Server:
|
188 |
```
|
189 |
-
vllm serve
|
190 |
```
|
191 |
|
192 |
Client:
|
193 |
```
|
194 |
-
python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model
|
195 |
```
|
196 |
|
197 |
# Serving with vllm
|
198 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
199 |
```
|
200 |
-
vllm serve
|
201 |
```
|
|
|
109 |
|
110 |
## int4wo-hqq
|
111 |
```
|
112 |
+
lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-int4wo-hqq --tasks hellaswag --device cuda:0 --batch_size 8
|
113 |
```
|
114 |
|
115 |
`TODO: more complete eval results`
|
|
|
162 |
|
163 |
### int4wo-hqq
|
164 |
```
|
165 |
+
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model pytorch/Phi-4-mini-instruct-int4wo-hqq --batch-size 1
|
166 |
```
|
167 |
|
168 |
## benchmark_serving
|
|
|
186 |
### int4wo-hqq
|
187 |
Server:
|
188 |
```
|
189 |
+
vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
190 |
```
|
191 |
|
192 |
Client:
|
193 |
```
|
194 |
+
python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model pytorch/Phi-4-mini-instruct-int4wo-hqq --num-prompts 1
|
195 |
```
|
196 |
|
197 |
# Serving with vllm
|
198 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
199 |
```
|
200 |
+
vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
201 |
```
|