lhallee commited on
Commit
4f950cb
·
verified ·
1 Parent(s): 1542bb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -26,4 +26,8 @@ with torch.no_grad():
26
  print(embeddings.shape) # (1, 11, 1280)
27
  ```
28
 
 
 
 
 
29
 
 
26
  print(embeddings.shape) # (1, 11, 1280)
27
  ```
28
 
29
+ Because we trained in mixed-precision float16, float16 has closer outputs to the float32 weights then bfloat16.
30
+ When summing the MSE of 1000 sequences vs. the float32 weights:
31
+ Average MSE for FP16: 0.00000140
32
+ Average MSE for BF16: 0.00004125
33