Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ Based on growth technology, the Tele-FLM-1T model training is divided into three
|
|
35 |
- SwiGLU for activation function
|
36 |
- Linear bias disabled
|
37 |
- Embedding and language model head untied
|
38 |
-
- Input and output
|
39 |
|
40 |
Consequently, Tele-FLM-1T is largely compatible with Llama architecturally.
|
41 |
To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
|
|
|
35 |
- SwiGLU for activation function
|
36 |
- Linear bias disabled
|
37 |
- Embedding and language model head untied
|
38 |
+
- Input and output multiplier
|
39 |
|
40 |
Consequently, Tele-FLM-1T is largely compatible with Llama architecturally.
|
41 |
To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
|