noneUsername
/

Mistral-Nemo-Instruct-2407-W8A8-Dynamic-Per-Token

8-bit precision

Model card Files Files and versions Community

noneUsername commited on Oct 4, 2024

Commit

658687b

·

verified ·

1 Parent(s): 206e1c7

Create README.md

Files changed (1) hide show

README.md +11 -0

README.md ADDED Viewed

	@@ -0,0 +1,11 @@

+My first quantization uses the quantization method provided by vllm:
+https://docs.vllm.ai/en/latest/quantization/int8.html
+NUM_CALIBRATION_SAMPLES = 2048
+MAX_SEQUENCE_LENGTH = 8192
+smoothing_strength=0.8
+I will verify the validity of the model and update the readme as soon as possible.