srinivasbilla
/

tinymix-8x1b

Text Generation

text-generation-inference

Model card Files Files and versions Community

tinymix-8x1b / README.md

eastwind's picture

Create README.md

038bf33 over 1 year ago

|

history blame contribute delete

1.89 kB

	---
	license: apache-2.0
	language:
	- en
	---
	<div align="center">

	# TinyMix-8x1b
	</div>

	This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit)

	The Goal was to MoE-fy the TinyLlama model and then use this as a base model to further train from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself.

	More work coming!

	# Inference Template
	This is a merge of the base model, so treat it like a completion.
	```
	llm.generate('Quantum Tunneling is')
	```

	## Mergekit Config
	```
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	gate_mode: hidden
	dtype: bfloat16
	experts:
	- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
	positive_prompts: [""]
	- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
	positive_prompts: [""]
	- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
	positive_prompts: [""]
	- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
	positive_prompts: [""]
	- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
	positive_prompts: [""]
	- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
	positive_prompts: [""]
	- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
	positive_prompts: [""]
	- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
	positive_prompts: [""]
	```

	# Eval
	Thanks to u/mhenrichsen for thr HellaSwag score

	```
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|

	\|---------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|

	\|hellaswag\|Yaml \|none \| 0\|acc \|0.4659\|± \|0.0050\|

	\| \| \|none \| 0\|acc\_norm\|0.6044\|± \|0.0049\|

	```