Update README.md
Browse files
README.md
CHANGED
@@ -26,4 +26,8 @@ with torch.no_grad():
|
|
26 |
print(embeddings.shape) # (1, 11, 1280)
|
27 |
```
|
28 |
|
|
|
|
|
|
|
|
|
29 |
|
|
|
26 |
print(embeddings.shape) # (1, 11, 1280)
|
27 |
```
|
28 |
|
29 |
+
Because we trained in mixed-precision float16, float16 has closer outputs to the float32 weights then bfloat16.
|
30 |
+
When summing the MSE of 1000 sequences vs. the float32 weights:
|
31 |
+
Average MSE for FP16: 0.00000140
|
32 |
+
Average MSE for BF16: 0.00004125
|
33 |
|