|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
<div align="center"> |
|
|
|
# TinyMix-8x1b |
|
</div> |
|
|
|
This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit) |
|
|
|
The Goal was to MoE-fy the TinyLlama model and then use this as a base model to further train from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself. |
|
|
|
More work coming! |
|
|
|
# Inference Template |
|
This is a merge of the base model, so treat it like a completion. |
|
``` |
|
llm.generate('Quantum Tunneling is') |
|
``` |
|
|
|
## Mergekit Config |
|
``` |
|
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
gate_mode: hidden |
|
dtype: bfloat16 |
|
experts: |
|
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
positive_prompts: [""] |
|
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
positive_prompts: [""] |
|
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
positive_prompts: [""] |
|
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
positive_prompts: [""] |
|
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
positive_prompts: [""] |
|
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
positive_prompts: [""] |
|
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
positive_prompts: [""] |
|
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
positive_prompts: [""] |
|
``` |
|
|
|
# Eval |
|
Thanks to u/mhenrichsen for thr HellaSwag score |
|
|
|
``` |
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|
|
|---------|-------|------|-----:|--------|-----:|---|-----:| |
|
|
|
|hellaswag|Yaml |none | 0|acc |0.4659|± |0.0050| |
|
|
|
| | |none | 0|acc\_norm|0.6044|± |0.0049| |
|
|
|
``` |