Tanvir1337
/

Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B-GGUF

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B-GGUF / README.md

Tanvir1337's picture

improve readme clarity and formatting

da235d5 verified 5 months ago

|

history blame contribute delete

2.09 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- mistral
	base_model: Severian/Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B
	pipeline_tag: text-generation
	quantized_by: Tanvir1337
	---
	# Tanvir1337/Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B-GGUF

	This model has been quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp/), a high-performance inference engine for large language models.

	## System Prompt Format

	To interact with the model, use the following prompt format:
	```
	{System}
	### Prompt:
	{User}
	### Response:
	```

	## Usage Instructions

	If you're new to using GGUF files, refer to [TheBloke's README](https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF) for detailed instructions.

	## Quantization Options

	The following graph compares various quantization types (lower is better):

	![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)

	For more information on quantization, see [Artefact2's notes](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9).

	## Choosing the Right Model File

	To select the optimal model file, consider the following factors:

	1. Memory constraints: Determine how much RAM and/or VRAM you have available.
	2. Speed vs. quality: If you prioritize speed, choose a model that fits within your GPU's VRAM. For maximum quality, consider a model that fits within the combined RAM and VRAM of your system.

	Quantization formats:

	* K-quants (e.g., Q5_K_M): A good starting point, offering a balance between speed and quality.
	* I-quants (e.g., IQ3_M): Newer and more efficient, but may require specific hardware configurations (e.g., cuBLAS or rocBLAS).

	Hardware compatibility:

	* I-quants: Not compatible with Vulcan (AMD). If you have an AMD card, ensure you're using the rocBLAS build or a compatible inference engine.

	For more information on the features and trade-offs of each quantization format, refer to the [llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix).