ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1 · Fine-tuning ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1

Hi @SrikanthChellappa ,

I started to fine-tune a PEFT model for radiology. As a proof of concept, I created eltorio/IDEFICS3_ROCO based on Idefics3. Rather than fine-tuning Idefics3 with medical knowledge before using the medical imagery dataset, I’d like to test fine-tuning with your ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1 model.

My problem is that I cannot correctly tokenize the image. My original DataCollector was:

class MyDataCollator:
    def __init__(self, processor):
        self.processor = processor
        self.image_token_id = processor.tokenizer.additional_special_tokens_ids[
            processor.tokenizer.additional_special_tokens.index("<image>")
        ]

    def __call__(self, samples):
        texts = []
        images = []
        for sample in samples:
            image = sample["image"]
            answer = sample["caption"]
            messages = [
                {
                    "role": "system",
                    "content": [
                        {"type": "text", "text": prompt}
                    ]

                },
                {
                    "role": "user",
                    "content": [
                        {"type": "image"},
                    ]
                },
                {
                    "role": "assistant",
                    "content": [
                        {"type": "text", "text": answer}
                    ]
                }
            ]
            text = processor.apply_chat_template(messages, add_generation_prompt=False)
            texts.append(text.strip())
            images.append([image.convert('RGB')])

        batch = processor(text=texts, images=images, return_tensors="pt", padding=True)

        labels = batch["input_ids"].clone()
        labels[labels == processor.tokenizer.pad_token_id] = self.image_token_id
        batch["labels"] = labels

        return batch

Obviously, the image_token_id is different, but your tokenizer never finds any image. The tokenizer returns an error because re returns nothing.

There is something I misunderstood.

Do you have any sample DataCollector with images for your model? Thank you for sharing your model with the community.

Best,
Ronan