File size: 4,276 Bytes
5c70504
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7296514
3a69d2f
7296514
5c70504
7296514
5c70504
 
 
 
 
7296514
5c70504
 
 
 
 
7296514
5c70504
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7296514
3a69d2f
5c70504
 
 
 
 
 
 
 
 
 
 
 
 
 
7296514
5c70504
 
 
 
 
 
7296514
5c70504
 
 
 
 
 
 
 
 
 
 
3a69d2f
7296514
5c70504
 
 
 
 
 
 
7296514
5c70504
7296514
5c70504
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
language:
- ko
- en
library_name: transformers
base_model: Bllossom/llama-3.2-Korean-Bllossom-AICA-5B
tags:
- vision-language
- korean
- image-to-text
- multilingual
- fashion
- e-commerce
- text-classification
- text-generation-inference
- transformers
- unsloth
- mllama
- lora
datasets:
- hateslopacademy/otpensource_data
inference: true
license: cc-by-4.0
model_name: otpensource-vision-lora
size_categories: 1K<n<10K
task_categories:
- image-to-text
- text-classification
task_ids:
- image-captioning
- sentiment-analysis
---

# otpensource-vision LoRA

## ๋ชจ๋ธ ์„ค๋ช…

**otpensource-vision LoRA**๋Š” *otpensource-vision* ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ **LoRA (Low-Rank Adaptation)** ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต๋œ ๊ฒฝ๋Ÿ‰ Vision-Language ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ชจ๋ธ ๋Œ€๋น„ ์ ์€ ์—ฐ์‚ฐ๋Ÿ‰์œผ๋กœ ํŠน์ • ๋„๋ฉ”์ธ์— ์ตœ์ ํ™”๋œ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ํ•œ๊ตญ์–ด์™€ ์˜์–ด๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

### ์ฃผ์š” ํŠน์ง•
- **LoRA ๊ธฐ๋ฐ˜ ๊ฒฝ๋Ÿ‰ ์–ด๋Œ‘ํ„ฐ**: ๊ธฐ์กด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์ ์€ ์ž์›์œผ๋กœ ์ถ”๊ฐ€ ํ•™์Šต์ด ๊ฐ€๋Šฅ
- **Vision-Language ํƒœ์Šคํฌ ์ง€์›**: ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ํ…์ŠคํŠธ ์ž…๋ ฅ๋งŒ์œผ๋กœ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ˆ˜ํ–‰
- **ํŒจ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํ•™์Šต**: *otpensource_data*๋ฅผ ํ™œ์šฉํ•ด ํŒจ์…˜ ์นดํ…Œ๊ณ ๋ฆฌ, ์ƒ‰์ƒ, ๊ณ„์ ˆ ๋“ฑ์˜ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋ฐ ์ตœ์ ํ™”
- **๋น ๋ฅธ ์ ์šฉ ๋ฐ ํ™•์žฅ์„ฑ**: ๊ธฐ์กด ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •(Fine-tuning)ํ•  ๋•Œ LoRA ์–ด๋Œ‘ํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ์ ์šฉ ๊ฐ€๋Šฅ

---

## ๋ชจ๋ธ ์„ธ๋ถ€์‚ฌํ•ญ

### ํ•™์Šต ๋ฐ์ดํ„ฐ
๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹:
- **[otpensource_dataset](https://huggingface.co/datasets/hateslopacademy/otpensource_dataset)**:
  - ์•ฝ 9000๊ฐœ์˜ ํŒจ์…˜ ๋ฐ์ดํ„ฐ๋กœ ๊ตฌ์„ฑ
  - ์˜ท์˜ ์นดํ…Œ๊ณ ๋ฆฌ, ์ƒ‰์ƒ, ๊ณ„์ ˆ, ํŠน์ง•, ์ด๋ฏธ์ง€ URL ๋“ฑ์„ ํฌํ•จํ•˜์—ฌ Vision-Language ํ•™์Šต์— ์ตœ์ ํ™”

### ํ•™์Šต ๋ฐฉ์‹
- **๊ธฐ๋ฐ˜ ๋ชจ๋ธ**: Bllossom/llama-3.2-Korean-Bllossom-AICA-5B
- **์ตœ์ ํ™” ๊ธฐ๋ฒ•**: LoRA ์ ์šฉ
- **GPU ์š”๊ตฌ์‚ฌํ•ญ**: A100 40GB ์ด์ƒ ๊ถŒ์žฅ
- **ํ›ˆ๋ จ ํšจ์œจ์„ฑ**: LoRA๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ธฐ์กด ๋ชจ๋ธ ๋Œ€๋น„ 2๋ฐฐ ๋น ๋ฅธ ํ•™์Šต ์ˆ˜ํ–‰

---

## ์ฃผ์š” ์‚ฌ์šฉ ์‚ฌ๋ก€

### Vision-Language ํƒœ์Šคํฌ
1. **์ด๋ฏธ์ง€ ๋ถ„์„ ๋ฐ ์„ค๋ช…**
   - ์ž…๋ ฅ๋œ ์ด๋ฏธ์ง€์—์„œ ์˜ท์˜ ์นดํ…Œ๊ณ ๋ฆฌ, ์ƒ‰์ƒ, ๊ณ„์ ˆ, ํŠน์ง•์„ ์ถ”์ถœํ•˜์—ฌ JSON ํ˜•์‹์œผ๋กœ ๋ฐ˜ํ™˜.
   - ์˜ˆ์‹œ:
     ```json
     {
       "category": "ํŠธ๋ Œ์น˜์ฝ”ํŠธ",
       "gender": "์—ฌ",
       "season": "SS",
       "color": "๋„ค์ด๋น„",
       "material": "",
       "feature": "ํŠธ๋ Œ์น˜์ฝ”ํŠธ"
     }
     ```

2. **ํ…์ŠคํŠธ ๋ถ„์„ ๋ฐ ๋ถ„๋ฅ˜**
   - ํ…์ŠคํŠธ ์ž…๋ ฅ๋งŒ์œผ๋กœ ๊ฐ์ • ๋ถ„์„, ์งˆ๋ฌธ ์‘๋‹ต, ํ…์ŠคํŠธ ์š”์•ฝ ๋“ฑ์˜ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ํƒœ์Šคํฌ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ.

---

## ์ฝ”๋“œ ์˜ˆ์‹œ

### Vision-Language ํƒœ์Šคํฌ

```python
from transformers import MllamaForConditionalGeneration, MllamaProcessor
import torch
from PIL import Image
import requests

model = MllamaForConditionalGeneration.from_pretrained(
  'otpensource-vision-lora',
  torch_dtype=torch.bfloat16,
  device_map='auto'
)
processor = MllamaProcessor.from_pretrained('otpensource-vision-lora')

url = "https://image.msscdn.net/thumbnails/images/prd_img/20240710/4242307/detail_4242307_17205916382801_big.jpg?w=1200"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
  {'role': 'user', 'content': [
    {'type': 'image', 'image': image},
    {'type': 'text', 'text': '์ด ์˜ท์˜ ์ •๋ณด๋ฅผ JSON์œผ๋กœ ์•Œ๋ ค์ค˜.'}
  ]}
]

input_text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    image=image,
    text=input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
print(processor.decode(output[0]))
```

---

## ์—…๋กœ๋“œ๋œ ๋ชจ๋ธ ์ •๋ณด

- **๊ฐœ๋ฐœ์ž**: hateslopacademy
- **๋ผ์ด์„ ์Šค**: CC-BY-4.0
- **LoRA ํ•™์Šต ๋ชจ๋ธ**: otpensource-vision ๊ธฐ๋ฐ˜

์ด ๋ชจ๋ธ์€ [Unsloth](https://github.com/unslothai/unsloth) ๋ฐ Hugging Face TRL ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•ด ๊ธฐ์กด ๋ชจ๋ธ ๋Œ€๋น„ 2๋ฐฐ ๋น ๋ฅด๊ฒŒ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)