AI4Chem/ChemVLM-26B · Multi-GPU Inference Error (Expected all tensors to be on the same device, but found at least two devices; -sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed)

Hi!
I inference the AI4Chem/ChemVLM-26B on four NVIDIA 4090 GPUs, I load and use the model as follows:

model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
     low_cpu_mem_usage=True,
     trust_remote_code=True,
     device_map='auto').eval()

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
# set the max number of tiles in `max_num`
pixel_values = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()

generation_config = dict(
    num_beams=1,
    max_new_tokens=512,
    do_sample=False,
)

# single-round single-image conversation
question = "请详细描述图片" # Please describe the picture in detail
response = model.chat(tokenizer, pixel_values, question, generation_config)
print(question, response)

However, it gives the error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!

How can I solve the problem?

Moreover, I have referenced to the Issues from InternVL, and try to modify the device_map when loading model as follows:

        device_map = {
            'vision_model': 0,
            'mlp1': 0,
            'language_model.model.tok_embeddings': 0,
            'language_model.model.norm': 0,
            'language_model.output.weight': 0
        }
        for i in range(16):
            device_map[f'language_model.model.layers.{i}'] = 1
        for i in range(16, 32):
            device_map[f'language_model.model.layers.{i}'] = 2
        for i in range(32, 48):
            device_map[f'language_model.model.layers.{i}'] = 3
        print(device_map)
        # device_map = 'auto'
        model = AutoModel.from_pretrained(
            path,
            torch_dtype=torch.bfloat16,
            low_cpu_mem_usage=True,
            trust_remote_code=True,
            device_map=device_map).eval()

However, it gives another error:

/opt/conda/conda-bld/pytorch_1724789172399/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [114,0,0], thread: [95,0,0] Assertion `
-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.

Could you please have a look on this problem? Especially how to load and use the model on multi-GPU.

Thank you very much!