Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ This model is optimized for use with [VLLM](https://github.com/vllm-project/vllm
|
|
8 |
|
9 |
### Key Features of FP8 Marlin
|
10 |
|
11 |
-
The Marlin kernel achieves impressive efficiency by packing 4 8-bit values into an int32 and performing a 4xFP8 to 4xFP16/BF16 dequantization using bit arithmetic and SIMT operations. This approach yields nearly a **2x speedup** over FP16 on most models while maintaining **near lossless quality**.
|
12 |
|
13 |
#### FP8 Advantages on NVIDIA GPUs
|
14 |
|
|
|
8 |
|
9 |
### Key Features of FP8 Marlin
|
10 |
|
11 |
+
The NeuralMagic FP8 Marlin kernel achieves impressive efficiency by packing 4 8-bit values into an int32 and performing a 4xFP8 to 4xFP16/BF16 dequantization using bit arithmetic and SIMT operations. This approach yields nearly a **2x speedup** over FP16 on most models while maintaining **near lossless quality**.
|
12 |
|
13 |
#### FP8 Advantages on NVIDIA GPUs
|
14 |
|