Llama3-Taiwan-70B-Instruct-128K-AWQ-4bits leverages 4-bit quantized weights, processed with AutoAWQ, to significantly reduce GPU memory requirements.
For more information and detailed documentation, please refer to the links provided.