Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,11 @@ Base LLM: [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)
|
|
17 |
The model can generate interleaving images and videos, despite the absence of image-video pairs in the dataset. Video-LLaVa is uses an encoder trained for unified visual representation through alignment prior to projection.
|
18 |
Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.
|
19 |
|
|
|
|
|
|
|
|
|
|
|
20 |
**Paper or resources for more information:**
|
21 |
https://github.com/PKU-YuanGroup/Video-LLaVA
|
22 |
|
|
|
17 |
The model can generate interleaving images and videos, despite the absence of image-video pairs in the dataset. Video-LLaVa is uses an encoder trained for unified visual representation through alignment prior to projection.
|
18 |
Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.
|
19 |
|
20 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/videollava_example.png"
|
21 |
+
alt="drawing" width="600"/>
|
22 |
+
|
23 |
+
<small> VideoLLaVa example. Taken from the <a href="https://arxiv.org/abs/2311.10122">original paper.</a> </small>
|
24 |
+
|
25 |
**Paper or resources for more information:**
|
26 |
https://github.com/PKU-YuanGroup/Video-LLaVA
|
27 |
|