Update README.md
Browse files
README.md
CHANGED
@@ -150,12 +150,6 @@ Our int4wo is only optimized for batch size 1, so we'll only benchmark the batch
|
|
150 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
151 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
152 |
|
153 |
-
## Download vllm source code and install vllm
|
154 |
-
```
|
155 |
-
git clone [email protected]:vllm-project/vllm.git
|
156 |
-
VLLM_USE_PRECOMPILED=1 pip install .
|
157 |
-
```
|
158 |
-
|
159 |
## Download dataset
|
160 |
Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
|
161 |
|
|
|
150 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
151 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
152 |
|
|
|
|
|
|
|
|
|
|
|
|
|
153 |
## Download dataset
|
154 |
Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
|
155 |
|