Spaces:
Running
on
Zero
Running
on
Zero
add demo
Browse files
app.py
CHANGED
@@ -30,11 +30,12 @@ description1 ="""The **🌋📹LLaVA-Video-7B-Qwen2** is a 7B parameter model t
|
|
30 |
This model leverages the **SO400M vision backbone** for visual input and Qwen2 for language processing, making it highly efficient in multi-modal reasoning, including visual and video-based tasks.
|
31 |
🌋📹LLaVA-Video has larger variants of [32B](https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-32B-Qwen) and [72B](https://huggingface.co/lmms-lab/LLaVA-Video-72B-Qwen2) and with a [variant](https://huggingface.co/lmms-lab/LLaVA-Video-7B-Qwen2-Video-Only) only trained on the new synthetic data
|
32 |
For further details, please visit the [Project Page](https://github.com/LLaVA-VL/LLaVA-NeXT) or check out the corresponding [research paper](https://arxiv.org/abs/2410.02713).
|
33 |
-
|
34 |
-
description2 ="""- **Architecture**: `LlavaQwenForCausalLM`
|
35 |
- **Attention Heads**: 28
|
36 |
- **Hidden Layers**: 28
|
37 |
- **Hidden Size**: 3584
|
|
|
|
|
38 |
- **Intermediate Size**: 18944
|
39 |
- **Max Frames Supported**: 64
|
40 |
- **Languages Supported**: English, Chinese
|
|
|
30 |
This model leverages the **SO400M vision backbone** for visual input and Qwen2 for language processing, making it highly efficient in multi-modal reasoning, including visual and video-based tasks.
|
31 |
🌋📹LLaVA-Video has larger variants of [32B](https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-32B-Qwen) and [72B](https://huggingface.co/lmms-lab/LLaVA-Video-72B-Qwen2) and with a [variant](https://huggingface.co/lmms-lab/LLaVA-Video-7B-Qwen2-Video-Only) only trained on the new synthetic data
|
32 |
For further details, please visit the [Project Page](https://github.com/LLaVA-VL/LLaVA-NeXT) or check out the corresponding [research paper](https://arxiv.org/abs/2410.02713).
|
33 |
+
- **Architecture**: `LlavaQwenForCausalLM`
|
|
|
34 |
- **Attention Heads**: 28
|
35 |
- **Hidden Layers**: 28
|
36 |
- **Hidden Size**: 3584
|
37 |
+
"""
|
38 |
+
description2 ="""
|
39 |
- **Intermediate Size**: 18944
|
40 |
- **Max Frames Supported**: 64
|
41 |
- **Languages Supported**: English, Chinese
|