OpenNLPLab commited on
Commit
39c0658
1 Parent(s): f3bfc9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ This official repository unveils the TransNormerLLM3 model along with its open-s
20
 
21
  [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
22
 
23
- > [email protected]: We plan to increase the sequence length in pre-training stage to **10 million**: https://twitter.com/opennlplab/status/1776894730015789300
24
 
25
  # TransNormerLLM3
26
  - **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.
 
20
 
21
  [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
22
 
23
+ > [email protected]: We plan to scale the sequence length in pre-training stage to **10 million**: https://twitter.com/opennlplab/status/1776894730015789300
24
 
25
  # TransNormerLLM3
26
  - **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.