linear-moe-hub
/

RetNet-340M

Model card Files Files and versions

RetNet-340M / README.md

JusenK's picture

Update README.md

176654f verified 5 months ago

|

history blame contribute delete

611 Bytes

	---
	license: apache-2.0
	datasets:
	- cerebras/SlimPajama-627B
	language:
	- en
	---

	Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/abs/2307.08621).

	The model was trained on a sample of SlimPajama with 15B tokens.

	Due to changes in the MLP layer structure in the latest version of fla, the weights cannot be loaded. You can use the version at [fla](https://github.com/fla-org/flash-linear-attention/tree/8346a33792558d8e3eb206fe18404de037e11d9c) instead.