---
license: apache-2.0
datasets:
- unsloth/Radiology_mini
language:
- en
base_model:
- unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit
library_name: diffusers
tags:
- text-generation-inference
- transformers
- vision
---
# Llama-3.2-11B X 射线分析模型 (v1)

该模型是 [unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit) 的微调版本，专为分析医学放射影像（X 射线、CT 扫描、超声）而设计。它在 ROCO 放射数据集 ([unsloth/Radiology_mini](https://huggingface.co/datasets/unsloth/Radiology_mini)) 的一个子集上进行训练，以生成图像的描述，充当专业的放射技师。

## 训练

该模型使用 [Unsloth](https://github.com/unslothai/unsloth) 进行微调，以实现 2 倍更快的训练速度并减少内存使用。关键训练细节：

*   **基础模型：** `unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit`
*   **数据集：** `unsloth/Radiology_mini`
*   **LoRA 适配器：** 使用了参数高效的微调，仅训练模型参数的一小部分。
    *   `r = 16`
    *   `lora_alpha = 16`
    *   `lora_dropout = 0`
*   **训练超参数：**
    *   `learning_rate = 2e-4`
    *   `per_device_train_batch_size = 2`
    *   `gradient_accumulation_steps = 4`
    *   `max_steps = 30` (或 `num_train_epochs = 1` 用于完整运行)
    *   `optimizer = adamw_8bit`
*   **微调配置:**
    *   `finetune_vision_layers     = False`
    *   `finetune_language_layers   = True`
    *   `finetune_attention_modules = True`
    *   `finetune_mlp_modules       = True`

## 用法

该模型设计为将图像和文本提示作为输入，并生成文本描述。预期的输入格式是一个对话，如下所示：

```python
from unsloth import FastVisionModel
import torch
from transformers import TextStreamer

# 加载模型 (如果从本地加载，请将 "YLX1965/llama3-2-11b-xray-v1" 替换为您的模型路径)
model, tokenizer = FastVisionModel.from_pretrained(
    "YLX1965/llama3-2-11b-xray-v1",
    load_in_4bit = True,  # 对于 16 位加载，设置为 False
)
FastVisionModel.for_inference(model)

# 示例图像和指令 (替换为您的图像路径)
# from datasets import load_dataset
# dataset = load_dataset("unsloth/Radiology_mini", split="train")
# image = dataset[0]["image"]

image = "path/to/your/image.jpg" # 示例

instruction = "你是一位专业的放射科医生。请准确描述您在这张图片中看到的内容。"

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]

input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False,
    return_tensors = "pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,
                  use_cache = True, temperature = 1.5, min_p = 0.1)