nxnhjrjtbjfzhrovwl/hermes-limarp-13b-ggml-f16

This repository contains the unquantized Hermes+LIMARP merge in ggml format.

You can quantize the f16 ggml to the quantization of your choice by following the below steps:

Download and extract the llama.cpp binaries (or compile it yourself if you're on Linux)
Move the "quantize" executable to the same folder where you downloaded the f16 ggml model.
Open a command prompt window in that same folder and write the following command, making the changes that you see fit.

quantize.exe hermes-limarp-13b.ggmlv3.f16.bin hermes-limarp-13b.ggmlv3.q4_0.bin q4_0

Press enter to run the command and the quantized model will be generated in the folder.