File size: 2,087 Bytes
f59229f
 
 
 
 
 
 
 
 
 
 
 
 
b4cbc6c
 
da235d5
d5ac6b1
da235d5
d5ac6b1
da235d5
d5ac6b1
 
 
 
 
 
b4cbc6c
da235d5
b4cbc6c
da235d5
b4cbc6c
da235d5
b4cbc6c
da235d5
b4cbc6c
 
 
da235d5
b4cbc6c
da235d5
b4cbc6c
da235d5
b4cbc6c
da235d5
 
b4cbc6c
da235d5
b4cbc6c
da235d5
 
b4cbc6c
da235d5
b4cbc6c
da235d5
b4cbc6c
da235d5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: apache-2.0
language:
  - en
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - mistral
base_model: Severian/Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B
pipeline_tag: text-generation
quantized_by: Tanvir1337
---
# Tanvir1337/Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B-GGUF

This model has been quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp/), a high-performance inference engine for large language models.

## System Prompt Format

To interact with the model, use the following prompt format:
```
{System}
### Prompt:
{User}
### Response:
```

## Usage Instructions

If you're new to using GGUF files, refer to [TheBloke's README](https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF) for detailed instructions.

## Quantization Options

The following graph compares various quantization types (lower is better):

![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)

For more information on quantization, see [Artefact2's notes](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9).

## Choosing the Right Model File

To select the optimal model file, consider the following factors:

1. **Memory constraints**: Determine how much RAM and/or VRAM you have available.
2. **Speed vs. quality**: If you prioritize speed, choose a model that fits within your GPU's VRAM. For maximum quality, consider a model that fits within the combined RAM and VRAM of your system.

**Quantization formats**:

* **K-quants** (e.g., Q5_K_M): A good starting point, offering a balance between speed and quality.
* **I-quants** (e.g., IQ3_M): Newer and more efficient, but may require specific hardware configurations (e.g., cuBLAS or rocBLAS).

**Hardware compatibility**:

* **I-quants**: Not compatible with Vulcan (AMD). If you have an AMD card, ensure you're using the rocBLAS build or a compatible inference engine.

For more information on the features and trade-offs of each quantization format, refer to the [llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix).