RE-ADD float32 please.

#3
by ctranslate2-4you - opened

A lot of use prefer to use float32 versions of these kinds of models. The reason is, if a model is originally in float32 we can convert at runtime (e.g. using bitsandbytes) into either float16 or bfloat16 depending on whether a user has cuda compute level 8.6 or above (as required for bfloat16). Also, having it in float32 allows users to run it on a CPU natively rather than having to convert at runtime a bfloat16 model to float32 (as vanilla CPU usage requires float32).

Anytime a model is converted from bfloat16/float16 to float32 there is accuracy loss...Likewise, if a model is converted to float16 from bfloat16 (and vice versa) there is accuracy loss since the precisions have a different floating point format...

In summary...there are still use cases for keeping the model in float32. What I'm asking for is:

(1) Keep float32 versions of things on the repository for this model and any other Granite models...this is IN ADDITION to float16, bfloat16 or whatever other precisions you want to have...

(2) This allows people to download specific versions that they want to use.

@ctranslate2-4you it's still in the commit history.

but yes, I agree.

IBM Granite org

@ctranslate2-4you the model was saved in bf16, so the fp32 model is just an upcast.

Sign up or log in to comment