speakleash
/

Bielik-11B-v2.0-Instruct-FP8

Text Generation

text-generation-inference

Model card Files Files and versions Community

chrisociepa commited on Sep 10, 2024

Commit

14f6256

·

verified ·

1 Parent(s): 01bd498

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ base_model: speakleash/Bielik-11B-v2.0-Instruct
   <img src="https://huggingface.co/speakleash/Bielik-11B-v2/raw/main/speakleash_cyfronet.png">
 </p>
-# Bielik-11B-v2.2-Instruct-FP8
 This model was obtained by quantizing the weights and activations of [Bielik-11B-v.2.0-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.0-Instruct) to FP8 data type, ready for inference with vLLM >= 0.5.0 or SGLang.
 AutoFP8 is used for quantization. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.

   <img src="https://huggingface.co/speakleash/Bielik-11B-v2/raw/main/speakleash_cyfronet.png">
 </p>
+# Bielik-11B-v2.0-Instruct-FP8
 This model was obtained by quantizing the weights and activations of [Bielik-11B-v.2.0-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.0-Instruct) to FP8 data type, ready for inference with vLLM >= 0.5.0 or SGLang.
 AutoFP8 is used for quantization. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.