Update README.md
Browse files
README.md
CHANGED
@@ -210,10 +210,6 @@ and decode tokens per second will be more important than time to first token.
|
|
210 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
211 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
212 |
|
213 |
-
## Download dataset
|
214 |
-
Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
|
215 |
-
|
216 |
-
Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
|
217 |
## benchmark_latency
|
218 |
|
219 |
Need to install vllm nightly to get some recent changes
|
@@ -242,8 +238,15 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
|
242 |
|
243 |
We also benchmarked the throughput in a serving environment.
|
244 |
|
|
|
|
|
245 |
|
246 |
-
|
|
|
|
|
|
|
|
|
|
|
247 |
|
248 |
### baseline
|
249 |
Server:
|
|
|
210 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
211 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
212 |
|
|
|
|
|
|
|
|
|
213 |
## benchmark_latency
|
214 |
|
215 |
Need to install vllm nightly to get some recent changes
|
|
|
238 |
|
239 |
We also benchmarked the throughput in a serving environment.
|
240 |
|
241 |
+
Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
|
242 |
+
Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
|
243 |
|
244 |
+
Get vllm source code:
|
245 |
+
```
|
246 |
+
git clone [email protected]:vllm-project/vllm.git
|
247 |
+
```
|
248 |
+
|
249 |
+
Run the following under `vllm` root folder:
|
250 |
|
251 |
### baseline
|
252 |
Server:
|