Divyasreepat
commited on
Update README.md with new model card content
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
- keras
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
-
|
12 |
# Model Summary
|
13 |
|
14 |
Falcon-RW-1B is a 1B parameters causal decoder-only model built by [TII](https://www.tii.ae/) and trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). The architecture of the model is adopted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)) but it uses ALiBi.
|
@@ -80,4 +80,5 @@ The architecture is adapted from the GPT-3 paper ([Brown et al., 2020](https://a
|
|
80 |
url={https://arxiv.org/abs/2306.01116},
|
81 |
year={2023}
|
82 |
}
|
83 |
-
```
|
|
|
|
8 |
- keras
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
+
### Model Overview
|
12 |
# Model Summary
|
13 |
|
14 |
Falcon-RW-1B is a 1B parameters causal decoder-only model built by [TII](https://www.tii.ae/) and trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). The architecture of the model is adopted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)) but it uses ALiBi.
|
|
|
80 |
url={https://arxiv.org/abs/2306.01116},
|
81 |
year={2023}
|
82 |
}
|
83 |
+
```
|
84 |
+
|