|
--- |
|
base_model: concedo/Beepo-22B |
|
language: |
|
- en |
|
tags: |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/Beepo-22B-Q8_0-GGUF |
|
This model was converted to GGUF format from [`concedo/Beepo-22B`](https://huggingface.co/concedo/Beepo-22B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/concedo/Beepo-22B) for more details on the model. |
|
|
|
--- |
|
Model details: |
|
- |
|
This is a finetune done on top of https://huggingface.co/mistralai/Mistral-Small-Instruct-2409 making it less censored in general, while attempting to maintain excellent instruct capabilities. |
|
|
|
Key Features: |
|
- |
|
Retains Intelligence - LR was kept low and dataset heavily pruned to avoid losing too much of the original model's intelligence. |
|
Instruct prompt format supports Alpaca - Honestly, I don't know why more models don't use it. If you are an Alpaca format lover like me, this should help. The original Mistral instruct format can still be used, but is not recommended. |
|
Instruct Decensoring Applied - You should not need a jailbreak for a model to obey the user. The model should always do what you tell it to. No need for weird "Sure, I will" or kitten-murdering-threat tricks. No abliteration was done, only finetuning. This model is not evil. It does not judge or moralize. Like a good tool, it simply obeys. |
|
|
|
You can obtain the GGUF quantization of this model here: https://huggingface.co/concedo/Beepo-22B-GGUF |
|
|
|
Prompt template: Alpaca |
|
|
|
### Instruction: |
|
{prompt} |
|
|
|
### Response: |
|
|
|
Please leave any feedback or issues that you may have. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Beepo-22B-Q8_0-GGUF --hf-file beepo-22b-q8_0.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Beepo-22B-Q8_0-GGUF --hf-file beepo-22b-q8_0.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Beepo-22B-Q8_0-GGUF --hf-file beepo-22b-q8_0.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Beepo-22B-Q8_0-GGUF --hf-file beepo-22b-q8_0.gguf -c 2048 |
|
``` |
|
|