File size: 6,211 Bytes
07ad002 e046e74 07ad002 f939989 07ad002 f939989 07ad002 f939989 07ad002 f939989 07ad002 f939989 70f55ff f939989 07ad002 f939989 70f55ff f939989 07ad002 f939989 70f55ff f939989 07ad002 6430380 07ad002 ad56bbf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
language:
- en
license: cc-by-nc-4.0
model_name: Octopus-V4-GGUF
base_model: NexaAIDev/Octopus-v4
inference: false
model_creator: NexaAIDev
quantized_by: Nexa AI, Inc.
tags:
- function calling
- on-device language model
- gguf
- llama cpp
---
# Octopus V4-GGUF: Graph of language models
<p align="center">
- <a href="https://huggingface.co/NexaAIDev/Octopus-v4" target="_blank">Original Model</a>
- <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Website</a>
- <a href="https://github.com/NexaAI/octopus-v4" target="_blank">Octopus-v4 Github</a>
- <a href="https://arxiv.org/abs/2404.19296" target="_blank">ArXiv</a>
- <a href="https://huggingface.co/spaces/NexaAIDev/domain_llm_leaderboard" target="_blank">Domain LLM Leaderbaord</a>
</p>
<p align="center" width="100%">
<a><img src="octopus-v4-logo.png" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
</p>
**Acknowledgement**:
We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
## (Recommended) Run with [llama.cpp](https://github.com/ggerganov/llama.cpp)
1. **Clone and compile:**
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Compile the source code:
make
```
2. **Prepare the Input Prompt File:**
Navigate to the `prompt` folder inside the `llama.cpp`, and create a new file named `chat-with-octopus.txt`.
`chat-with-octopus.txt`:
```bash
User:
```
3. **Execute the Model:**
Run the following command in the terminal:
```bash
./main -m ./path/to/octopus-v4-Q4_K_M.gguf -c 512 -b 2048 -n 256 -t 1 --repeat_penalty 1.0 --top_k 0 --top_p 1.0 --color -i -r "User:" -f prompts/chat-with-octopus.txt
```
Example prompt to interact
```bash
<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>
```
## Run with [Ollama](https://github.com/ollama/ollama)
Since our models have not been uploaded to the Ollama server, please download the models and manually import them into <u>[Ollama]((https://github.com/ollama/ollama))</u> by following these steps:
1. Locate the local Ollama directory:
```bash
cd ollama
```
2. Create a `Modelfile` in your directory and include a `FROM` statement with the path to your local model:
```bash
FROM ./path/to/octopus-v4-Q4_K_M.gguf
```
2. Use the following command to add the model to Ollama:
```bash
ollama create octopus-v4-Q4_K_M -f Modelfile
PARAMETER temperature 0
PARAMETER num_ctx 1024
PARAMETER stop <nexa_end>
```
3. Verify that the model has been successfully imported:
```bash
ollama ls
```
### Run the model
```bash
ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
```
### Dataset and Benchmark
* Utilized questions from [MMLU](https://github.com/hendrycks/test) to evaluate the performances.
* Evaluated with the Ollama [llm-benchmark](https://github.com/MinhNgyuen/llm-benchmark) method.
## Quantized GGUF Models
| Name | Quant method | Bits | Size | Respons (token/second) | Use Cases |
| ---------------------- | ------------ | ---- | ------- | ---------------------- | ----------------------------------------- |
| Octopus-v4.gguf | | | 7.64 GB | 27.64 | extremely large |
| Octopus-v4-Q2_K.gguf | Q2_K | 2 | 1.42 GB | 54.20 | extremely not recommended, high loss |
| Octopus-v4-Q3_K.gguf | Q3_K | 3 | 1.96 GB | 51.22 | not recommended |
| Octopus-v4-Q3_K_S.gguf | Q3_K_S | 3 | 1.68 GB | 51.78 | not very recommended |
| Octopus-v4-Q3_K_M.gguf | Q3_K_M | 3 | 1.96 GB | 50.86 | not very recommended |
| Octopus-v4-Q3_K_L.gguf | Q3_K_L | 3 | 2.09 GB | 50.05 | not very recommended |
| Octopus-v4-Q4_0.gguf | Q4_0 | 4 | 2.18 GB | 65.76 | good quality, recommended |
| Octopus-v4-Q4_1.gguf | Q4_1 | 4 | 2.41 GB | 69.01 | slow, good quality, recommended |
| Octopus-v4-Q4_K.gguf | Q4_K | 4 | 2.39 GB | 55.76 | slow, good quality, recommended |
| Octopus-v4-Q4_K_S.gguf | Q4_K_S | 4 | 2.19 GB | 53.98 | high quality, recommended |
| Octopus-v4-Q4_K_M.gguf | Q4_K_M | 4 | 2.39 GB | 58.39 | some functions loss, not very recommended |
| Octopus-v4-Q5_0.gguf | Q5_0 | 5 | 2.64 GB | 61.98 | slow, good quality |
| Octopus-v4-Q5_1.gguf | Q5_1 | 5 | 2.87 GB | 63.44 | slow, good quality |
| Octopus-v4-Q5_K.gguf | Q5_K | 5 | 2.82 GB | 58.28 | moderate speed, recommended |
| Octopus-v4-Q5_K_S.gguf | Q5_K_S | 5 | 2.64 GB | 59.95 | moderate speed, recommended |
| Octopus-v4-Q5_K_M.gguf | Q5_K_M | 5 | 2.82 GB | 53.31 | fast, good quality, recommended |
| Octopus-v4-Q6_K.gguf | Q6_K | 6 | 3.14 GB | 52.15 | large, not very recommended |
| Octopus-v4-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | 50.10 | very large, good quality |
| Octopus-v4-f16.gguf | f16 | 16 | 7.64 GB | 30.61 | extremely large |
_Quantized with llama.cpp_ |