mradermacher
/

Llama-3.1-Minitron-4B-Width-Base-GGUF

Inference Endpoints

Model card Files Files and versions Community

mradermacher commited on Sep 8

Commit

3fc844a

•

1 Parent(s): bf0df8a

auto-patch README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -3,7 +3,8 @@ base_model: nvidia/Llama-3.1-Minitron-4B-Width-Base
 language:
 - en
 library_name: transformers
-no_imatrix: "cvs/llama.cpp/ggml/src/ggml.c:6399: GGML_ASSERT(c->ne[0] >= n_dims / 2) failed"
 quantized_by: mradermacher
 ---
 ## About
@@ -16,7 +17,6 @@ quantized_by: mradermacher
 static quants of https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base
 <!-- provided-files -->
-weighted/imatrix quants seem not to be available (by me) at this time. If they do not show up a week or so after the static ones, I have probably not planned for them. Feel free to request them by opening a Community Discussion.
 ## Usage
 If you are unsure how to use GGUF files, refer to one of [TheBloke's

 language:
 - en
 library_name: transformers
+no_imatrix: 'cvs/llama.cpp/ggml/src/ggml.c:6399: GGML_ASSERT(c->ne[0] >= n_dims /
+  2) failed'
 quantized_by: mradermacher
 ---
 ## About
 static quants of https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base
 <!-- provided-files -->
 ## Usage
 If you are unsure how to use GGUF files, refer to one of [TheBloke's