suriyagunasekar
commited on
Commit
·
9e27d7d
1
Parent(s):
046a667
Update README.md
Browse files
README.md
CHANGED
@@ -56,7 +56,7 @@ Given these potential pitfalls, and others not explicitly mentioned, it's essent
|
|
56 |
## Training
|
57 |
### Model (phi-1)
|
58 |
* Architecture: a Transformer-based model with next-word prediction objective
|
59 |
-
* Training tokens: 54B tokens (
|
60 |
* Precision: fp16
|
61 |
* GPUs: 8 A100
|
62 |
* Training time: 6 days
|
@@ -67,7 +67,7 @@ Given these potential pitfalls, and others not explicitly mentioned, it's essent
|
|
67 |
* [flash-attention](https://github.com/HazyResearch/flash-attention)
|
68 |
|
69 |
### License
|
70 |
-
The model is licensed under [Research License](
|
71 |
|
72 |
### Citation
|
73 |
```bib
|
|
|
56 |
## Training
|
57 |
### Model (phi-1)
|
58 |
* Architecture: a Transformer-based model with next-word prediction objective
|
59 |
+
* Training tokens: 54B tokens (7B unique tokens)
|
60 |
* Precision: fp16
|
61 |
* GPUs: 8 A100
|
62 |
* Training time: 6 days
|
|
|
67 |
* [flash-attention](https://github.com/HazyResearch/flash-attention)
|
68 |
|
69 |
### License
|
70 |
+
The model is licensed under [Research License](Research License.docx).
|
71 |
|
72 |
### Citation
|
73 |
```bib
|