eastwind commited on
Commit
cc2bd6e
1 Parent(s): 424b7b7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - cerebras/SlimPajama-627B
5
+ - bigcode/starcoderdata
6
+ - HuggingFaceH4/ultrachat_200k
7
+ - HuggingFaceH4/ultrafeedback_binarized
8
+ language:
9
+ - en
10
+ ---
11
+ <div align="center">
12
+
13
+ # TinyMix-8x1b-Chat
14
+ </div>
15
+
16
+ This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit)
17
+
18
+ The Goal was to MoE-fy the TinyLlama model and then use this as a base model to finetune from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself.
19
+
20
+ More work coming!
21
+
22
+ # Chat Template
23
+ ```
24
+ def make_prompt(instruction):
25
+ return f"<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n"
26
+
27
+ llm.generate(make_prompt('What is quantum tunneling?'))
28
+ ```
29
+
30
+ ## Mergekit Config
31
+ ```
32
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
33
+ gate_mode: hidden
34
+ dtype: bfloat16
35
+ experts:
36
+ - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
37
+ positive_prompts: [""]
38
+ - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
39
+ positive_prompts: [""]
40
+ - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
41
+ positive_prompts: [""]
42
+ - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
43
+ positive_prompts: [""]
44
+ - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
45
+ positive_prompts: [""]
46
+ - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
47
+ positive_prompts: [""]
48
+ - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
49
+ positive_prompts: [""]
50
+ - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
51
+ positive_prompts: [""]
52
+ ```