kimyoungjune
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ library_name: transformers
|
|
21 |
**VARCO-VISION-14B** is a powerful English-Korean Vision-Language Model (VLM) developed through four distinct training phases, culminating in a final preference optimization stage. Designed to excel in both multimodal and text-only tasks, VARCO-VISION-14B not only surpasses other models of similar size in performance but also achieves scores comparable to those of proprietary models. The model currently accepts a single image and accompanying text as input, generating text as output. It supports grounding—the ability to identify the locations of objects within an image—as well as OCR (Optical Character Recognition) to recognize text within images.
|
22 |
|
23 |
- **Developed by:** NC Research, Multimodal Generation Team
|
24 |
-
- **Technical Report:** [
|
25 |
- **Demo Page:** [Coming Soon]()
|
26 |
- **Languages:** Korean, English
|
27 |
- **License:** CC BY-NC 4.0
|
|
|
21 |
**VARCO-VISION-14B** is a powerful English-Korean Vision-Language Model (VLM) developed through four distinct training phases, culminating in a final preference optimization stage. Designed to excel in both multimodal and text-only tasks, VARCO-VISION-14B not only surpasses other models of similar size in performance but also achieves scores comparable to those of proprietary models. The model currently accepts a single image and accompanying text as input, generating text as output. It supports grounding—the ability to identify the locations of objects within an image—as well as OCR (Optical Character Recognition) to recognize text within images.
|
22 |
|
23 |
- **Developed by:** NC Research, Multimodal Generation Team
|
24 |
+
- **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
|
25 |
- **Demo Page:** [Coming Soon]()
|
26 |
- **Languages:** Korean, English
|
27 |
- **License:** CC BY-NC 4.0
|