Quantization of model
#1
by
kuliev-vitaly
- opened
Model is to large for starting on even 2 A100. Quantizations should help with hardware requirements.
Could you please make awq(4bit) or fp8 quantizations?
Model is to large for starting on even 2 A100. Quantizations should help with hardware requirements.
Could you please make awq(4bit) or fp8 quantizations?