thirteenbit
/

madlad400-10b-mt-gguf

Model card Files Files and versions Community

madlad400-10b-mt-gguf / README.md

thirteenbit's picture

Update README.md

7abb801 verified 6 months ago

|

history blame contribute delete

1.65 kB

	---
	base_model: google/madlad400-10b-mt
	inference: false
	license: apache-2.0
	model_name: madlad400-10b-mt-gguf
	pipeline_tag: translation
	---

	# MADLAD-400-10B-MT - GGUF

	- Original model: [MADLAD-400-10B-MT](https://huggingface.co/google/madlad400-10b-mt)

	## Description

	This repo contains GGUF format model files for [MADLAD-400-10B-MT](https://huggingface.co/google/madlad400-10b-mt) for
	use with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible software.

	Converted to gguf using llama.cpp [convert_hf_to_gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py)
	and quantized using llama.cpp llama-quantize, llama.cpp version [b3325](https://github.com/ggerganov/llama.cpp/commits/b3325).


	## Provided files

	\| Name \| Quant method \| Bits \| Size \| VRAM required \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \|
	\| [model-q3_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q3_k_m.gguf) \| Q3_K_M \| 3 \| 4.9 GB\| 5.7 GB \|
	\| [model-q4_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q4_k_m.gguf) \| Q4_K_M \| 4 \| 6.3 GB\| 7.1 GB \|
	\| [model-q5_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q5_k_m.gguf) \| Q5_K_M \| 5 \| 7.2 GB\| 7.9 GB \|
	\| [model-q6_k.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q6_k.gguf) \| Q6_K \| 6 \| 8.2 GB\| 8.9 GB \|
	\| [model-q8_0.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q8_0.gguf) \| Q8_0 \| 8 \| 11 GB\| 11.3 GB \|

	Note: the above VRAM usage figures are observed with all layers GPU offloading, on Linux with NVIDIA GPU.