--- license: apache-2.0 datasets: - unsloth/Radiology_mini language: - en base_model: - unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit library_name: diffusers tags: - text-generation-inference - transformers - vision --- # Llama-3.2-11B X 射线分析模型 (v1) 该模型是 [unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit) 的微调版本,专为分析医学放射影像(X 射线、CT 扫描、超声)而设计。它在 ROCO 放射数据集 ([unsloth/Radiology_mini](https://huggingface.co/datasets/unsloth/Radiology_mini)) 的一个子集上进行训练,以生成图像的描述,充当专业的放射技师。 ## 训练 该模型使用 [Unsloth](https://github.com/unslothai/unsloth) 进行微调,以实现 2 倍更快的训练速度并减少内存使用。关键训练细节: * **基础模型:** `unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit` * **数据集:** `unsloth/Radiology_mini` * **LoRA 适配器:** 使用了参数高效的微调,仅训练模型参数的一小部分。 * `r = 16` * `lora_alpha = 16` * `lora_dropout = 0` * **训练超参数:** * `learning_rate = 2e-4` * `per_device_train_batch_size = 2` * `gradient_accumulation_steps = 4` * `max_steps = 30` (或 `num_train_epochs = 1` 用于完整运行) * `optimizer = adamw_8bit` * **微调配置:** * `finetune_vision_layers = False` * `finetune_language_layers = True` * `finetune_attention_modules = True` * `finetune_mlp_modules = True` ## 用法 该模型设计为将图像和文本提示作为输入,并生成文本描述。预期的输入格式是一个对话,如下所示: ```python from unsloth import FastVisionModel import torch from transformers import TextStreamer # 加载模型 (如果从本地加载,请将 "YLX1965/llama3-2-11b-xray-v1" 替换为您的模型路径) model, tokenizer = FastVisionModel.from_pretrained( "YLX1965/llama3-2-11b-xray-v1", load_in_4bit = True, # 对于 16 位加载,设置为 False ) FastVisionModel.for_inference(model) # 示例图像和指令 (替换为您的图像路径) # from datasets import load_dataset # dataset = load_dataset("unsloth/Radiology_mini", split="train") # image = dataset[0]["image"] image = "path/to/your/image.jpg" # 示例 instruction = "你是一位专业的放射科医生。请准确描述您在这张图片中看到的内容。" messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": instruction} ]} ] input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True) inputs = tokenizer( image, input_text, add_special_tokens = False, return_tensors = "pt", ).to("cuda") text_streamer = TextStreamer(tokenizer, skip_prompt = True) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, use_cache = True, temperature = 1.5, min_p = 0.1)