File size: 2,087 Bytes
f59229f b4cbc6c da235d5 d5ac6b1 da235d5 d5ac6b1 da235d5 d5ac6b1 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 b4cbc6c da235d5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
license: apache-2.0
language:
- en
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
base_model: Severian/Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B
pipeline_tag: text-generation
quantized_by: Tanvir1337
---
# Tanvir1337/Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B-GGUF
This model has been quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp/), a high-performance inference engine for large language models.
## System Prompt Format
To interact with the model, use the following prompt format:
```
{System}
### Prompt:
{User}
### Response:
```
## Usage Instructions
If you're new to using GGUF files, refer to [TheBloke's README](https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF) for detailed instructions.
## Quantization Options
The following graph compares various quantization types (lower is better):
![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)
For more information on quantization, see [Artefact2's notes](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9).
## Choosing the Right Model File
To select the optimal model file, consider the following factors:
1. **Memory constraints**: Determine how much RAM and/or VRAM you have available.
2. **Speed vs. quality**: If you prioritize speed, choose a model that fits within your GPU's VRAM. For maximum quality, consider a model that fits within the combined RAM and VRAM of your system.
**Quantization formats**:
* **K-quants** (e.g., Q5_K_M): A good starting point, offering a balance between speed and quality.
* **I-quants** (e.g., IQ3_M): Newer and more efficient, but may require specific hardware configurations (e.g., cuBLAS or rocBLAS).
**Hardware compatibility**:
* **I-quants**: Not compatible with Vulcan (AMD). If you have an AMD card, ensure you're using the rocBLAS build or a compatible inference engine.
For more information on the features and trade-offs of each quantization format, refer to the [llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix). |