cosmoem-8x1B / README.md
Lambent's picture
Update README.md
0132ff0 verified
|
raw
history blame
588 Bytes
metadata
license: apache-2.0
datasets:
  - HuggingFaceTB/cosmopedia

An untrained precursor MoE created from Cosmo using mergekit.

Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.

Degenerate layers are 0, 1, and 2 (I believe this means experts will be underutilized for the lowest-level features). Best I could do with test-and-try prompt-based routing. Further research might start from the reversed direction, if available in some interpretability tool (activating layer into prompts).