This repository contains the unquantized Hermes+LIMARP merge in ggml format.
You can quantize the f16 ggml to the quantization of your choice by following the below steps:
- Download and extract the llama.cpp binaries (or compile it yourself if you're on Linux)
- Move the "quantize" executable to the same folder where you downloaded the f16 ggml model.
- Open a command prompt window in that same folder and write the following command, making the changes that you see fit.
quantize.exe hermes-limarp-13b.ggmlv3.f16.bin hermes-limarp-13b.ggmlv3.q4_0.bin q4_0
- Press enter to run the command and the quantized model will be generated in the folder.