ssmits
/

Falcon2-5.5B-multilingual

Text Generation

tiiuae/falcon-11B

text-generation-inference

Model card Files Files and versions Community

ssmits commited on Jun 5, 2024

Commit

ab408ab

·

verified ·

1 Parent(s): 13633e2

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -24,9 +24,9 @@ language:
 ---
 ## Why prune?
-Falcon-11B is still undertrained, as can be seen by this graph:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/QeaL9bOrPskustzFpjMUP.png)
-This is why the choice is made by prune 50% of the layers.
 Note that \~1B of continued pre-training (\~1M rows of 1k tokens) is still required to restore the perplexity of this model in the desired language.
 I'm planning on doing that for certain languages, depending on how much compute will be available.

 ---
 ## Why prune?
+Even though [Falcon-11B](https://huggingface.co/tiiuae/falcon-11B) is trained on 5T tokens, it is still undertrained, as can be seen by this graph:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/QeaL9bOrPskustzFpjMUP.png)
+This is why the choice is made to prune 50% of the layers.
 Note that \~1B of continued pre-training (\~1M rows of 1k tokens) is still required to restore the perplexity of this model in the desired language.
 I'm planning on doing that for certain languages, depending on how much compute will be available.