fp16 or bf16 version?

#6
by xiangli - opened

Hi, is there a float16 or bfloat16 version? The fp32 model takes too much memory , and the code is customized specifically for fp32, not easy to infer in fp16 or bf16.

We have adjusted to code for work with bfloat16, although note I have seen this change the model's output a bit.

We have adjusted to code for work with bfloat16, although note I have seen this change the model's output a bit.

What kind of VRAM requirements are there for this model + fp32 as well as bf16? Am already blown away by the 7B but curious to interact with the 72B.

The VRAM requirements for this model are similar to those of the Qwen 7B models. I recommend referring to the Model Size Estimator on Hugging Face for detailed information.

Sign up or log in to comment