Run inference in CPU

by hythythyt3 - opened May 29

May 29

Hello , is runnig this model on CPU/RAM posible?

Jun 19

Yes. You will need several modifications:

comment .cuda() in /root/.cache/huggingface/modules/transformers_modules/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/6f97087daec17e4b033d4d846c0b64c09c4268cd/modeling_internvl_chat.py and your demo code should not use .cuda()
change “use_flash_attn” to false in /root/.cache/huggingface/hub/models–OpenGVLab–Mini-InternVL-Chat-4B-V1-5/snapshots/6f97087daec17e4b033d4d846c0b64c09c4268cd/config.json

v3ss0n

Jul 13

I cannot run in Lmdeploy. Which inverence engine should we use ? Any Quants?

zwgao

OpenGVLab org Aug 21

•

I cannot run in Lmdeploy. Which inverence engine should we use ? Any Quants?

czczup changed discussion status to closed Aug 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment