fix typo
#6
by
luow-amd
- opened
README.md
CHANGED
@@ -34,7 +34,7 @@ python3 quantize_quark.py \
|
|
34 |
--multi_gpu
|
35 |
```
|
36 |
## Deployment
|
37 |
-
Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(
|
38 |
|
39 |
## Evaluation
|
40 |
Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
|
|
|
34 |
--multi_gpu
|
35 |
```
|
36 |
## Deployment
|
37 |
+
Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
|
38 |
|
39 |
## Evaluation
|
40 |
Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
|