Triangle104's picture
Update README.md
21c8592 verified
|
raw
history blame
2.84 kB
metadata
base_model: concedo/Beepo-22B
language:
  - en
tags:
  - llama-cpp
  - gguf-my-repo

Triangle104/Beepo-22B-Q5_K_S-GGUF

This model was converted to GGUF format from concedo/Beepo-22B using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.


Model details:

This is a finetune done on top of https://huggingface.co/mistralai/Mistral-Small-Instruct-2409 making it less censored in general, while attempting to maintain excellent instruct capabilities.

Key Features:

Retains Intelligence - LR was kept low and dataset heavily pruned to avoid losing too much of the original model's intelligence.
Instruct prompt format supports Alpaca - Honestly, I don't know why more models don't use it. If you are an Alpaca format lover like me, this should help. The original Mistral instruct format can still be used, but is not recommended.
Instruct Decensoring Applied - You should not need a jailbreak for a model to obey the user. The model should always do what you tell it to. No need for weird "Sure, I will" or kitten-murdering-threat tricks. No abliteration was done, only finetuning. This model is not evil. It does not judge or moralize. Like a good tool, it simply obeys.

You can obtain the GGUF quantization of this model here: https://huggingface.co/concedo/Beepo-22B-GGUF

Prompt template: Alpaca

Instruction:

{prompt}

Response:

Please leave any feedback or issues that you may have.


Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Triangle104/Beepo-22B-Q5_K_S-GGUF --hf-file beepo-22b-q5_k_s.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Triangle104/Beepo-22B-Q5_K_S-GGUF --hf-file beepo-22b-q5_k_s.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Triangle104/Beepo-22B-Q5_K_S-GGUF --hf-file beepo-22b-q5_k_s.gguf -p "The meaning to life and the universe is"

or

./llama-server --hf-repo Triangle104/Beepo-22B-Q5_K_S-GGUF --hf-file beepo-22b-q5_k_s.gguf -c 2048