kalpsnuti
/

llama-213-chat-gguf

@@ -68,11 +68,11 @@ They are also compatible with many third party UIs and libraries - please see th
 GGML_TYPE_Q5_K - "type-1" 5-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 5.5 bpw.
 ## Models
->| Name | Quant method | Bits | Size | Max RAM required | Use case |
->| ---- | ---- | ---- | ---- | ---- | ----- |
->| [ggml-model-q5km.gguf](https://huggingface.co/kalpsnuti/llama-213-chat-gguf/blob/main/ggml-model-q5km.gguf) | Q5_K_M | 5 | 8.6 GB| 11.73 GB | large, very low quality loss|
->
->**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 ## Downloading the GGUF file(s)
 ### using manual `download`
@@ -99,7 +99,7 @@ huggingface-cli download kalpsnuti/llama-213-chat-gguf ggml-model-q5km.gguf --lo
 ```
 [*huggingface.co/docs => Hub Python Library => HOW-TO GUIDES => Download files*](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli) has full documentation on downloading with `huggingface-cli`.
 ```shell
-# downloads on fast connections (1Gbit/s or higher)
 pip3 install hf_transfer
 ```
 ##### ...first set the environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
@@ -117,12 +117,12 @@ Clone and cd to the [llama.cpp](https://github.com/ggerganov/llama.cpp/commit/24
 ```
 ##### first run screenshot...
 ![How are you today?](first_run.png "Ragini first words")
-> **Options - set as appropriate**
-> `-ngl 32` indicates `32` layers to offload to GPU. Remove if GPU acceleration is not available.
-> `-c 4096` indicates `4k` context length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
-> `-p <PROMPT>` indicates the *conversation style*, change to `-i` *or* `--interactive` to interact by giving `<PROMPT>` in chat style.
->
-> *The [llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md) has detailed information on the ***above & other*** model running parameters.*
 ## Thanks
 Thanks **TheBlokeAI** team for inspirations!

 GGML_TYPE_Q5_K - "type-1" 5-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 5.5 bpw.
 ## Models
+| Name | Quant method | Bits | Size | Max RAM required | Use case |
+| ---- | ---- | ---- | ---- | ---- | ----- |
+| [ggml-model-q5km.gguf](https://huggingface.co/kalpsnuti/llama-213-chat-gguf/blob/main/ggml-model-q5km.gguf) | Q5_K_M | 5 | 8.6 GB| 11.73 GB | large, very low quality loss|
+**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 ## Downloading the GGUF file(s)
 ### using manual `download`
 ```
 [*huggingface.co/docs => Hub Python Library => HOW-TO GUIDES => Download files*](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli) has full documentation on downloading with `huggingface-cli`.
 ```shell
+#downloads on fast connections (1Gbit/s or higher)
 pip3 install hf_transfer
 ```
 ##### ...first set the environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
 ```
 ##### first run screenshot...
 ![How are you today?](first_run.png "Ragini first words")
+**Options - set as appropriate**
+`-ngl 32` indicates `32` layers to offload to GPU. Remove if GPU acceleration is not available.
+`-c 4096` indicates `4k` context length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
+`-p <PROMPT>` indicates the *conversation style*, change to `-i` *or* `--interactive` to interact by giving `<PROMPT>` in chat style.
+*The [llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md) has detailed information on the ***above & other*** model running parameters.*
 ## Thanks
 Thanks **TheBlokeAI** team for inspirations!