File size: 6,211 Bytes
07ad002
 
 
e046e74
07ad002
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f939989
07ad002
f939989
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
07ad002
 
f939989
07ad002
 
f939989
 
 
 
07ad002
f939989
70f55ff
 
 
 
 
 
 
 
 
 
f939989
 
 
07ad002
f939989
70f55ff
f939989
 
 
 
 
 
07ad002
f939989
70f55ff
f939989
 
 
 
 
 
 
07ad002
 
 
 
 
 
 
 
 
 
 
 
6430380
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
07ad002
ad56bbf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
language:
  - en
license: cc-by-nc-4.0
model_name: Octopus-V4-GGUF
base_model: NexaAIDev/Octopus-v4
inference: false
model_creator: NexaAIDev
quantized_by: Nexa AI, Inc.
tags:
  - function calling
  - on-device language model
  - gguf
  - llama cpp
---
# Octopus V4-GGUF: Graph of language models


<p align="center">
- <a href="https://huggingface.co/NexaAIDev/Octopus-v4" target="_blank">Original Model</a>
- <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Website</a>
- <a href="https://github.com/NexaAI/octopus-v4" target="_blank">Octopus-v4 Github</a>
- <a href="https://arxiv.org/abs/2404.19296" target="_blank">ArXiv</a>
- <a href="https://huggingface.co/spaces/NexaAIDev/domain_llm_leaderboard" target="_blank">Domain LLM Leaderbaord</a>
</p>

<p align="center" width="100%">
  <a><img src="octopus-v4-logo.png" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
</p>

**Acknowledgement**:  
We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.

## (Recommended) Run with [llama.cpp](https://github.com/ggerganov/llama.cpp)

1. **Clone and compile:**

```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Compile the source code:
make
```

2. **Prepare the Input Prompt File:**
   
Navigate to the `prompt` folder inside the `llama.cpp`, and create a new file named `chat-with-octopus.txt`.

   `chat-with-octopus.txt`:

```bash
User: 
```
  
3. **Execute the Model:**

Run the following command in the terminal:

```bash
./main -m ./path/to/octopus-v4-Q4_K_M.gguf -c 512 -b 2048 -n 256 -t 1 --repeat_penalty 1.0 --top_k 0 --top_p 1.0 --color -i -r "User:" -f prompts/chat-with-octopus.txt
```

Example prompt to interact  
```bash
<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>
```

## Run with [Ollama](https://github.com/ollama/ollama)

Since our models have not been uploaded to the Ollama server, please download the models and manually import them into <u>[Ollama]((https://github.com/ollama/ollama))</u> by following these steps:

1. Locate the local Ollama directory:
```bash
cd ollama
```

2. Create a `Modelfile` in your directory and include a `FROM` statement with the path to your local model:

```bash
FROM ./path/to/octopus-v4-Q4_K_M.gguf
```

2. Use the following command to add the model to Ollama:

```bash
ollama create octopus-v4-Q4_K_M -f Modelfile
PARAMETER temperature 0
PARAMETER num_ctx 1024
PARAMETER stop <nexa_end>
```

3. Verify that the model has been successfully imported:

```bash
ollama ls
```

### Run the model
```bash
ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
```

### Dataset and Benchmark

* Utilized questions from [MMLU](https://github.com/hendrycks/test) to evaluate the performances.
* Evaluated with the Ollama [llm-benchmark](https://github.com/MinhNgyuen/llm-benchmark) method.


## Quantized GGUF Models

| Name                   | Quant method | Bits | Size    | Respons (token/second) | Use Cases                                 |
| ---------------------- | ------------ | ---- | ------- | ---------------------- | ----------------------------------------- |
| Octopus-v4.gguf        |              |      | 7.64 GB | 27.64                  | extremely large                           |
| Octopus-v4-Q2_K.gguf   | Q2_K         | 2    | 1.42 GB | 54.20                  | extremely not recommended, high loss      |
| Octopus-v4-Q3_K.gguf   | Q3_K         | 3    | 1.96 GB | 51.22                  | not recommended                           |
| Octopus-v4-Q3_K_S.gguf | Q3_K_S       | 3    | 1.68 GB | 51.78                  | not very recommended                      |
| Octopus-v4-Q3_K_M.gguf | Q3_K_M       | 3    | 1.96 GB | 50.86                  | not very recommended                      |
| Octopus-v4-Q3_K_L.gguf | Q3_K_L       | 3    | 2.09 GB | 50.05                  | not very recommended                      |
| Octopus-v4-Q4_0.gguf   | Q4_0         | 4    | 2.18 GB | 65.76                  | good quality, recommended                 |
| Octopus-v4-Q4_1.gguf   | Q4_1         | 4    | 2.41 GB | 69.01                  | slow, good quality, recommended           |
| Octopus-v4-Q4_K.gguf   | Q4_K         | 4    | 2.39 GB | 55.76                  | slow, good quality, recommended           |
| Octopus-v4-Q4_K_S.gguf | Q4_K_S       | 4    | 2.19 GB | 53.98                  | high quality, recommended                 |
| Octopus-v4-Q4_K_M.gguf | Q4_K_M       | 4    | 2.39 GB | 58.39                  | some functions loss, not very recommended |
| Octopus-v4-Q5_0.gguf   | Q5_0         | 5    | 2.64 GB | 61.98                  | slow, good quality                        |
| Octopus-v4-Q5_1.gguf   | Q5_1         | 5    | 2.87 GB | 63.44                  | slow, good quality                        |
| Octopus-v4-Q5_K.gguf   | Q5_K         | 5    | 2.82 GB | 58.28                  | moderate speed, recommended               |
| Octopus-v4-Q5_K_S.gguf | Q5_K_S       | 5    | 2.64 GB | 59.95                  | moderate speed, recommended               |
| Octopus-v4-Q5_K_M.gguf | Q5_K_M       | 5    | 2.82 GB | 53.31                  | fast, good quality, recommended           |
| Octopus-v4-Q6_K.gguf   | Q6_K         | 6    | 3.14 GB | 52.15                  | large, not very recommended               |
| Octopus-v4-Q8_0.gguf   | Q8_0         | 8    | 4.06 GB | 50.10                  | very large, good quality                  |
| Octopus-v4-f16.gguf    | f16          | 16   | 7.64 GB | 30.61                  | extremely large                           |

_Quantized with llama.cpp_