--- license: apache-2.0 language: - zh - en base_model: - meta-llama/Llama-3.2-11B-Vision-Instruct tags: - llama - lora - chinese - zh - mllama pipeline_tag: image-text-to-text library_name: peft --- # Llama-3.2-Vision-chinese-lora - base model: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) ## Features - Utilize a large amount of high-quality Chinese text and VQA data to significantly enhance the model's Chinese OCR capabilities. ## Use with transformers ```python import torch from transformers import MllamaForConditionalGeneration, AutoProcessor from peft import PeftModel from PIL import Image # Base model ID and LoRA model ID base_model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct" lora_model_id = "Kadins/Llama-3.2-Vision-chinese-lora" # Load the processor processor = AutoProcessor.from_pretrained(base_model_id) # Load the base model base_model = MllamaForConditionalGeneration.from_pretrained( base_model_id, device_map="auto", torch_dtype=torch.float16 # Use torch.bfloat16 if your hardware supports it ).eval() # Load the LoRA model and apply it to the base model model = PeftModel.from_pretrained(base_model, lora_model_id) # Optionally, merge the LoRA weights with the base model for faster inference model = model.merge_and_unload() # Load an example image (replace 'path_to_image.jpg' with your image file) image_path = 'path_to_image.jpg' image = Image.open(image_path) # User prompt in Chinese user_prompt = "请描述这张图片。" # Prepare the content with the image and text content = [ {"type": "image", "image": image}, {"type": "text", "text": user_prompt} ] # Apply the chat template to create the prompt prompt = processor.apply_chat_template( [{"role": "user", "content": content}], add_generation_prompt=True ) # Prepare the inputs for the model inputs = processor( images=image, text=prompt, return_tensors="pt" ).to(model.device) # Generate the model's response output = model.generate(**inputs, max_new_tokens=512) # Decode the output to get the assistant's response response = processor.decode(output[0], skip_special_tokens=True) # Print the assistant's response print("Assistant:", response) ```