Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ license: apache-2.0
|
|
8 |
### Architecture
|
9 |
|
10 |
<p align="left">
|
11 |
-
<img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/architecture.png" width="
|
12 |
</p>
|
13 |
|
14 |
|
@@ -25,14 +25,14 @@ To further enhance the model's multimodal capabilities, we employ trainable spec
|
|
25 |
2. Once the adapter has learned to map ViT's visual embeddings to the language model's textual space, we proceed to unfreeze Mistral for improved understanding of dialog formats and complex queries.
|
26 |
|
27 |
<p align="left">
|
28 |
-
<img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/datasets.png" width="
|
29 |
</p>
|
30 |
|
31 |
### Results
|
32 |
|
33 |
OmniFusion was benchmarked against the latest multimodal SOTA models. It excelled in generative metrics and classification benchmarks like VisualDialog.
|
34 |
<p align="left">
|
35 |
-
<img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/radar.png" width="
|
36 |
</p>
|
37 |
|
38 |
Model Performance on Visual Dialog Benchmark
|
@@ -45,7 +45,7 @@ Model Performance on Visual Dialog Benchmark
|
|
45 |
### Examples
|
46 |
|
47 |
<p align="left">
|
48 |
-
<img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/examples.png" width="
|
49 |
</p>
|
50 |
|
51 |
### Future Plans
|
|
|
8 |
### Architecture
|
9 |
|
10 |
<p align="left">
|
11 |
+
<img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/architecture.png" width="100%">
|
12 |
</p>
|
13 |
|
14 |
|
|
|
25 |
2. Once the adapter has learned to map ViT's visual embeddings to the language model's textual space, we proceed to unfreeze Mistral for improved understanding of dialog formats and complex queries.
|
26 |
|
27 |
<p align="left">
|
28 |
+
<img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/datasets.png" width="70%">
|
29 |
</p>
|
30 |
|
31 |
### Results
|
32 |
|
33 |
OmniFusion was benchmarked against the latest multimodal SOTA models. It excelled in generative metrics and classification benchmarks like VisualDialog.
|
34 |
<p align="left">
|
35 |
+
<img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/radar.png" width="70%">
|
36 |
</p>
|
37 |
|
38 |
Model Performance on Visual Dialog Benchmark
|
|
|
45 |
### Examples
|
46 |
|
47 |
<p align="left">
|
48 |
+
<img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/examples.png" width="100%">
|
49 |
</p>
|
50 |
|
51 |
### Future Plans
|