MADLAD-400-10B-MT - GGUF

Description

This repo contains GGUF format model files for MADLAD-400-10B-MT for use with llama.cpp and compatible software.

Converted to gguf using llama.cpp convert_hf_to_gguf.py and quantized using llama.cpp llama-quantize, llama.cpp version b3325.

Name	Quant method	Bits	Size	VRAM required
model-q3_k_m.gguf	Q3_K_M	3	4.9 GB	5.7 GB
model-q4_k_m.gguf	Q4_K_M	4	6.3 GB	7.1 GB
model-q5_k_m.gguf	Q5_K_M	5	7.2 GB	7.9 GB
model-q6_k.gguf	Q6_K	6	8.2 GB	8.9 GB
model-q8_0.gguf	Q8_0	8	11 GB	11.3 GB

Note: the above VRAM usage figures are observed with all layers GPU offloading, on Linux with NVIDIA GPU.