Commit
•
14947b1
1
Parent(s):
92c687b
Update README.md (#4)
Browse files- Update README.md (52c0c49819dbf5bab7ee90bdccbe818cffb279f4)
Co-authored-by: Yusuf Erdem <[email protected]>
README.md
CHANGED
@@ -18,13 +18,25 @@ datasets:
|
|
18 |
This is a Turkish visual language model designed for multi-modal visual instruction-following tasks. It utilizes the LLaVA (Large Language and Vision Assistant) architecture, integrating the `ytucosmos/Turkish-Llama-8b-Instruct-v0.1` language model. The model is capable of processing both visual (image) and textual inputs, allowing it to understand and execute instructions provided in Turkish.
|
19 |
|
20 |
# Model Details
|
21 |
-
The model was pretrained
|
22 |
It was further fine-tuned using subsets the following datasets to enhance its visual reasoning and understanding capabilities:
|
23 |
- **[Stanford GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html)**
|
24 |
- **[VisualGenome](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html)**
|
25 |
- **[COCO](https://cocodataset.org/#home)**
|
|
|
26 |
|
27 |
## Example Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
```python
|
29 |
from lmdeploy import pipeline, ChatTemplateConfig
|
30 |
from lmdeploy.vl import load_image
|
@@ -38,6 +50,7 @@ image = load_image(url)
|
|
38 |
response = pipe(('Bu resimde öne çıkan ögeler nelerdir?', image))
|
39 |
|
40 |
print(response)
|
|
|
41 |
"""
|
42 |
Resimde, çiçeklerle dolu bir bahçede yavru bir köpek ve arka planda bir ağaç yer alıyor.
|
43 |
Köpek, çiçeklerin arasında otururken ve etrafını saran çiçeklerin arasından bakarken görülebiliyor.
|
@@ -45,6 +58,9 @@ Bu sahne, köpeğin bahçede geçirdiği zamanın tadını çıkardığı ve çe
|
|
45 |
"""
|
46 |
```
|
47 |
|
|
|
|
|
|
|
48 |
# Acknowledgments
|
49 |
- Computing resources used in this work were provided by the National Center for High Performance Computing of Turkey (UHeM).
|
50 |
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
|
|
|
18 |
This is a Turkish visual language model designed for multi-modal visual instruction-following tasks. It utilizes the LLaVA (Large Language and Vision Assistant) architecture, integrating the `ytucosmos/Turkish-Llama-8b-Instruct-v0.1` language model. The model is capable of processing both visual (image) and textual inputs, allowing it to understand and execute instructions provided in Turkish.
|
19 |
|
20 |
# Model Details
|
21 |
+
The model was pretrained on **[LLaVA-CC3M-Pretrain-595K](https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K)** dataset, which was translated to Turkish using DeepL Translate.<br>
|
22 |
It was further fine-tuned using subsets the following datasets to enhance its visual reasoning and understanding capabilities:
|
23 |
- **[Stanford GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html)**
|
24 |
- **[VisualGenome](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html)**
|
25 |
- **[COCO](https://cocodataset.org/#home)**
|
26 |
+
- **110K multi-turn instruction following data** consisting of **book covers**, to enhance models capabilities on tasks regarding OCR.
|
27 |
|
28 |
## Example Usage
|
29 |
+
|
30 |
+
#### Using lmdeploy
|
31 |
+
|
32 |
+
1. Install requirements:
|
33 |
+
```
|
34 |
+
pip install 'lmdeploy>=0.4.0'
|
35 |
+
pip install git+https://github.com/haotian-liu/LLaVA.git --no-deps
|
36 |
+
```
|
37 |
+
|
38 |
+
2. Run the following code:
|
39 |
+
|
40 |
```python
|
41 |
from lmdeploy import pipeline, ChatTemplateConfig
|
42 |
from lmdeploy.vl import load_image
|
|
|
50 |
response = pipe(('Bu resimde öne çıkan ögeler nelerdir?', image))
|
51 |
|
52 |
print(response)
|
53 |
+
|
54 |
"""
|
55 |
Resimde, çiçeklerle dolu bir bahçede yavru bir köpek ve arka planda bir ağaç yer alıyor.
|
56 |
Köpek, çiçeklerin arasında otururken ve etrafını saran çiçeklerin arasından bakarken görülebiliyor.
|
|
|
58 |
"""
|
59 |
```
|
60 |
|
61 |
+
Image used in this example:
|
62 |
+
<img src="./example.png"/>
|
63 |
+
|
64 |
# Acknowledgments
|
65 |
- Computing resources used in this work were provided by the National Center for High Performance Computing of Turkey (UHeM).
|
66 |
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
|