sagar007 commited on
Commit
9eb4fca
·
verified ·
1 Parent(s): 0c56b2f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +67 -3
README.md CHANGED
@@ -1,3 +1,67 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # LLaVA-Phi Model
3
+
4
+ This is a vision-language model based on Microsoft's Phi-1.5 architecture with CLIP for image processing capabilities.
5
+
6
+ ## Model Description
7
+
8
+ - **Base Model**: Microsoft Phi-1.5
9
+ - **Vision Encoder**: CLIP ViT-B/32
10
+ - **Training**: QLoRA fine-tuning
11
+ - **Dataset**: Instruct 150K
12
+
13
+ ## Usage
14
+
15
+ ```python
16
+ from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
17
+ import torch
18
+ from PIL import Image
19
+
20
+ # Load model and tokenizer
21
+ model = AutoModelForCausalLM.from_pretrained("sagar007/Lava_phi")
22
+ tokenizer = AutoTokenizer.from_pretrained("sagar007/Lava_phi")
23
+ processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")
24
+
25
+ # For text
26
+ def generate_text(prompt):
27
+ inputs = tokenizer(f"human: {prompt}\ngpt:", return_tensors="pt")
28
+ outputs = model.generate(**inputs, max_new_tokens=128)
29
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
30
+
31
+ # For images
32
+ def process_image_and_prompt(image_path, prompt):
33
+ image = Image.open(image_path)
34
+ image_tensor = processor(images=image, return_tensors="pt").pixel_values
35
+
36
+ inputs = tokenizer(f"human: <image>\n{prompt}\ngpt:", return_tensors="pt")
37
+ outputs = model.generate(
38
+ input_ids=inputs["input_ids"],
39
+ attention_mask=inputs["attention_mask"],
40
+ images=image_tensor,
41
+ max_new_tokens=128
42
+ )
43
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
44
+ ```
45
+
46
+ ## Training Details
47
+
48
+ - Trained using QLoRA (Quantized Low-Rank Adaptation)
49
+ - 4-bit quantization for efficiency
50
+ - Gradient checkpointing enabled
51
+ - Mixed precision training (bfloat16)
52
+
53
+ ## License
54
+
55
+ MIT License
56
+
57
+ ## Citation
58
+
59
+ ```bibtex
60
+ @software{llava_phi_2024,
61
+ author = {sagar007},
62
+ title = {LLaVA-Phi: Vision-Language Model},
63
+ year = {2024},
64
+ publisher = {Hugging Face},
65
+ url = {https://huggingface.co/sagar007/Lava_phi}
66
+ }
67
+ ```