File size: 4,217 Bytes
9605200 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
````markdown
# lora_fine_tuned_phi-4_quantized_vision
This repository contains a fine-tuned version of the **Phi-4** language model specifically adapted for **image-to-text generation**.
The model has been fine-tuned using **LoRA (Low-Rank Adaptation)** on the **FGVC Aircraft** dataset, which consists of images of aircraft with corresponding textual descriptions. This fine-tuning process enables the model to generate more accurate and descriptive captions for aircraft images.
**Key Features:**
* **4-bit Quantization:** The model utilizes 4-bit quantization techniques to reduce its size and memory footprint, making it more efficient to deploy and use.
* **LoRA:** Fine-tuning is performed with LoRA, which allows for efficient adaptation of the model while keeping the number of trainable parameters low.
* **Image Captioning:** The model is specifically trained to generate textual descriptions (captions) for images of aircraft.
**Intended Use Cases:**
* **Image Captioning:** Generate descriptive captions for aircraft images.
* **Aircraft Recognition:** Assist in identifying different types of aircraft based on their visual features.
* **Educational Purposes:** Used as a tool for learning about different aircraft models.
**How to Use:**
You can use this model directly from Hugging Face Transformers:
```python
from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM
from peft import PeftModel
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("frankmorales2020/lora_fine_tuned_phi-4_quantized_vision")
# Load the base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/phi-4",
quantization_config=bnb_config,
low_cpu_mem_usage=True
)
# Load the locally fine-tuned model with LoRA adapter
model = PeftModel.from_pretrained(
base_model, # Pass the base model instance
"frankmorales2020/lora_fine_tuned_phi-4_quantized_vision", # Load from HF Hub
device_map={"": 0},
)
# Set the pad_token_id for the model explicitly
model.generation_config.pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
tokenizer.pad_token = tokenizer.eos_token
model.pad_token_id = model.config.eos_token_id
# Create a text generation pipeline
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
# Generate captions for an image (replace with your image processing logic)
image_path = "path/to/your/aircraft/image.jpg"
# ... (Add your image loading and preprocessing code here) ...
prompt = f"Generate a caption for the following image: {processed_image}"
generated_caption = generator(prompt, max_length=64)[0]['generated_text']
print(generated_caption)
````
**Training Data:**
The model was trained on the FGVC Aircraft dataset ([https://www.robots.ox.ac.uk/\~vgg/data/fgvc-aircraft/](https://www.google.com/url?sa=E&source=gmail&q=https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/)).
**Evaluation:**
The model was evaluated using the BLEU metric on a held-out test set from the FGVC Aircraft dataset.
**Limitations:**
* The model is specifically fine-tuned for aircraft images and may not generalize well to other types of images.
* The generated captions may sometimes be overly generic or lack fine-grained details.
**Future Work:**
* Fine-tune the model on a larger and more diverse dataset of images.
* Explore more advanced image encoding techniques to improve the model's understanding of visual features.
* Experiment with different decoding strategies to generate more detailed and human-like captions.
**Acknowledgements:**
This work is based on the Phi-4 language model developed by Microsoft and utilizes the Hugging Face Transformers and Datasets libraries.
```
**Remember to:**
* Replace `"path/to/your/aircraft/image.jpg"` with the actual path to your image.
* Add your image loading and preprocessing code in the designated section.
* Consider adding a license (e.g., MIT License) to your repository.
``` |