Run inference in CPU
#1
by
hythythyt3
- opened
Hello , is runnig this model on CPU/RAM posible?
Yes. You will need several modifications:
- comment .cuda() in /root/.cache/huggingface/modules/transformers_modules/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/6f97087daec17e4b033d4d846c0b64c09c4268cd/modeling_internvl_chat.py and your demo code should not use .cuda()
- change “use_flash_attn” to false in /root/.cache/huggingface/hub/models–OpenGVLab–Mini-InternVL-Chat-4B-V1-5/snapshots/6f97087daec17e4b033d4d846c0b64c09c4268cd/config.json
I cannot run in Lmdeploy. Which inverence engine should we use ? Any Quants?
I cannot run in Lmdeploy. Which inverence engine should we use ? Any Quants?
Please refer to https://internvl.readthedocs.io/en/latest/internvl2.0/deployment.html
czczup
changed discussion status to
closed