microsoft
/

phi-1

@@ -56,7 +56,7 @@ Given these potential pitfalls, and others not explicitly mentioned, it's essent
 ## Training
 ### Model (phi-1)
 * Architecture: a Transformer-based model with next-word prediction objective
-* Training tokens: 54B tokens (7 unique tokens)
 * Precision: fp16
 * GPUs: 8 A100
 * Training time: 6 days
@@ -67,7 +67,7 @@ Given these potential pitfalls, and others not explicitly mentioned, it's essent
 * [flash-attention](https://github.com/HazyResearch/flash-attention)
 ### License
-The model is licensed under [Research License](https://todo).
 ### Citation
 ```bib

 ## Training
 ### Model (phi-1)
 * Architecture: a Transformer-based model with next-word prediction objective
+* Training tokens: 54B tokens (7B unique tokens)
 * Precision: fp16
 * GPUs: 8 A100
 * Training time: 6 days
 * [flash-attention](https://github.com/HazyResearch/flash-attention)
 ### License
+The model is licensed under [Research License](Research License.docx).
 ### Citation
 ```bib