Update README.md
Browse files
README.md
CHANGED
@@ -47,6 +47,9 @@ output = llm.chat(messages=messages, sampling_params=sampling_params)
|
|
47 |
print(output[0].outputs[0].text)
|
48 |
```
|
49 |
|
|
|
|
|
|
|
50 |
## Serving
|
51 |
Then we can serve with the following command:
|
52 |
```Shell
|
|
|
47 |
print(output[0].outputs[0].text)
|
48 |
```
|
49 |
|
50 |
+
Note: please use `VLLM_DISABLE_COMPILE_CACHE=1` to disable compile cache when running this code, e.g. `VLLM_DISABLE_COMPILE_CACHE=1 python example.py`, since there are some issues with the composability of compile in vLLM and torchao,
|
51 |
+
this is expected be resolved in pytorch 2.8.
|
52 |
+
|
53 |
## Serving
|
54 |
Then we can serve with the following command:
|
55 |
```Shell
|