eastwind commited on
Commit
038bf33
·
1 Parent(s): f4f8c0c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ ---
6
+ <div align="center">
7
+
8
+ # TinyMix-8x1b
9
+ </div>
10
+
11
+ This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit)
12
+
13
+ The Goal was to MoE-fy the TinyLlama model and then use this as a base model to further train from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself.
14
+
15
+ More work coming!
16
+
17
+ # Inference Template
18
+ This is a merge of the base model, so treat it like a completion.
19
+ ```
20
+ llm.generate('Quantum Tunneling is')
21
+ ```
22
+
23
+ ## Mergekit Config
24
+ ```
25
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
26
+ gate_mode: hidden
27
+ dtype: bfloat16
28
+ experts:
29
+ - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
30
+ positive_prompts: [""]
31
+ - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
32
+ positive_prompts: [""]
33
+ - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
34
+ positive_prompts: [""]
35
+ - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
36
+ positive_prompts: [""]
37
+ - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
38
+ positive_prompts: [""]
39
+ - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
40
+ positive_prompts: [""]
41
+ - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
42
+ positive_prompts: [""]
43
+ - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
44
+ positive_prompts: [""]
45
+ ```
46
+
47
+ # Eval
48
+ Thanks to u/mhenrichsen for thr HellaSwag score
49
+
50
+ ```
51
+ | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
52
+
53
+ |---------|-------|------|-----:|--------|-----:|---|-----:|
54
+
55
+ |hellaswag|Yaml |none | 0|acc |0.4659|± |0.0050|
56
+
57
+ | | |none | 0|acc\_norm|0.6044|± |0.0049|
58
+
59
+ ```