MADLAD-400-10B-MT - GGUF

Description

This repo contains GGUF format model files for MADLAD-400-10B-MT for use with llama.cpp and compatible software.

Converted to gguf using llama.cpp convert_hf_to_gguf.py and quantized using llama.cpp llama-quantize, llama.cpp version b3325.

Provided files

Name Quant method Bits Size VRAM required
model-q3_k_m.gguf Q3_K_M 3 4.9 GB 5.7 GB
model-q4_k_m.gguf Q4_K_M 4 6.3 GB 7.1 GB
model-q5_k_m.gguf Q5_K_M 5 7.2 GB 7.9 GB
model-q6_k.gguf Q6_K 6 8.2 GB 8.9 GB
model-q8_0.gguf Q8_0 8 11 GB 11.3 GB

Note: the above VRAM usage figures are observed with all layers GPU offloading, on Linux with NVIDIA GPU.

Downloads last month
231
GGUF
Model size
10.7B params
Architecture
t5

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for thirteenbit/madlad400-10b-mt-gguf

Quantized
(9)
this model