Natkituwu
/

Kunokukulemonchini-7b-7.1bpw-exl2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Kunokukulemonchini-7b-7.1bpw-exl2 / README.md

Natkituwu's picture

Update README.md

79c82c7 verified 7 months ago

|

history blame contribute delete

1.58 kB

	---
	base_model:
	- grimjim/kukulemon-7B
	- Nitral-AI/Kunocchini-7b-128k-test
	library_name: transformers
	tags:
	- mergekit
	- merge
	- mistral
	- alpaca
	license: cc-by-nc-4.0
	---

	# Kunokukulemonchini-7b-7.1bpw-exl2

	This is an 7.1 bpw exl2 quant of a merger [icefog72/Kunokukulemonchini-7b](https://huggingface.co/icefog72/Kunokukulemonchini-7b).

	I wanted to replicate what IceFog did with 6GB cards, looking for long context and quality but scaling it to 8GB cards.

	Works great for people with 8GB of vram who are looking for both long context and quality.

	With a 4060 8GB i end up getting 16k context and better quality responces compared to the 6.5bpw version.

	## Merge Details

	Slightly edited kukulemon-7B config.json before merge to get at least ~32k context window.

	### Merge Method

	This model was merged using the SLERP merge method.

	### Models Merged

	The following models were included in the merge:
	* [grimjim/grimjim/kukulemon-7B](https://huggingface.co/grimjim/kukulemon-7B)
	* [Nitral-AI/Kunocchini-7b-128k-test](https://huggingface.co/Nitral-AI/Kunocchini-7b-128k-test)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml

	slices:
	- sources:
	- model: grimjim/kukulemon-7B
	layer_range: [0, 32]
	- model: Nitral-AI/Kunocchini-7b-128k-test
	layer_range: [0, 32]
	merge_method: slerp
	base_model: Nitral-AI/Kunocchini-7b-128k-test
	parameters:
	t:
	- filter: self_attn
	value: [0, 0.5, 0.3, 0.7, 1]
	- filter: mlp
	value: [1, 0.5, 0.7, 0.3, 0]
	- value: 0.5
	dtype: float16
	```