metadata
license: apache-2.0
datasets:
- HuggingFaceTB/cosmopedia
An untrained precursor MoE created from Cosmo using mergekit.
Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.
Degenerate layers were 0, 1, and 2. Expert gates for layers 0, 1, and 2 have been randomly initialized to with luck mitigate this.