alexmarques commited on
Commit
777080d
1 Parent(s): 213e837

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -2
README.md CHANGED
@@ -46,7 +46,6 @@ Weight quantization also reduces disk size requirements by approximately 50%.
46
  Only weights and activations of the linear operators within transformers blocks are quantized.
47
  Weights are quantized with a symmetric static per-channel scheme, where a fixed linear scaling factor is applied between FP8 and floating point representations for each output channel dimension.
48
  Activations are quantized with a symmetric per-tensor scheme, where a fixed linear scaling factor is applied between FP8 and floating point representations for the entire activation tensor.
49
- Linear scaling factors are computed via by minimizing the mean squarred error (MSE).
50
  Weights are quantized by rounding to nearest FP8 representation.
51
  The [llm-compressor](https://github.com/vllm-project/llm-compressor) library was applied to quantize the model, usin 512 sequences sequences taken from Neural Magic's [LLM compression calibration dataset](https://huggingface.co/datasets/neuralmagic/LLM_compression_calibration).
52
 
@@ -112,7 +111,6 @@ recipe = QuantizationModifier(
112
  targets="Linear",
113
  scheme="FP8",
114
  ignore=["lm_head"],
115
- observer="mse",
116
  )
117
  ]
118
 
 
46
  Only weights and activations of the linear operators within transformers blocks are quantized.
47
  Weights are quantized with a symmetric static per-channel scheme, where a fixed linear scaling factor is applied between FP8 and floating point representations for each output channel dimension.
48
  Activations are quantized with a symmetric per-tensor scheme, where a fixed linear scaling factor is applied between FP8 and floating point representations for the entire activation tensor.
 
49
  Weights are quantized by rounding to nearest FP8 representation.
50
  The [llm-compressor](https://github.com/vllm-project/llm-compressor) library was applied to quantize the model, usin 512 sequences sequences taken from Neural Magic's [LLM compression calibration dataset](https://huggingface.co/datasets/neuralmagic/LLM_compression_calibration).
51
 
 
111
  targets="Linear",
112
  scheme="FP8",
113
  ignore=["lm_head"],
 
114
  )
115
  ]
116