eastwind
/

tinymix-8x1b-chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

eastwind commited on Jan 2

Commit

cc2bd6e

•

1 Parent(s): 424b7b7

Create README.md

Files changed (1) hide show

README.md +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+---
+license: apache-2.0
+datasets:
+- cerebras/SlimPajama-627B
+- bigcode/starcoderdata
+- HuggingFaceH4/ultrachat_200k
+- HuggingFaceH4/ultrafeedback_binarized
+language:
+- en
+---
+<div align="center">
+# TinyMix-8x1b-Chat
+</div>
+This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit)
+The Goal was to MoE-fy the TinyLlama model and then use this as a base model to finetune from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself.
+More work coming!
+# Chat Template
+```
+def make_prompt(instruction):
+  return f"<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n"
+llm.generate(make_prompt('What is quantum tunneling?'))
+```
+## Mergekit Config
+```
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+gate_mode: hidden
+dtype: bfloat16
+experts:
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts: [""]
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts: [""]
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts: [""]
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts: [""]
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts: [""]
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts: [""]
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts: [""]
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts: [""]
+```