V100 GPU Support?
#5
by
deleted
- opened
Could you please support V100’s inference? Thank you very much.
Hello, thanks for your attention. We are preparing quantitative models that can run on a 32G V100.
Look forward to further developments!
https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-Int8
Thanks. The key problem for V100 is the flash attention. I believe that quantitative models can not solve it without changing the attention code.
deleted
changed discussion status to
closed