Spaces:
Running
on
Zero
Running
on
Zero
add gradio interface
Browse files
app.py
CHANGED
@@ -28,7 +28,11 @@ import shutil
|
|
28 |
title = "# 🙋🏻♂️Welcome to 🌟Tonic's 🌋📹LLaVA-Video!"
|
29 |
description1 ="""The **🌋📹LLaVA-Video-7B-Qwen2** is a 7B parameter model trained on the 🌋📹LLaVA-Video-178K dataset and the LLaVA-OneVision dataset. It is [based on the **Qwen2 language model**](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f), supporting a context window of up to 32K tokens. The model can process and interact with images, multi-images, and videos, with specific optimizations for video analysis.
|
30 |
This model leverages the **SO400M vision backbone** for visual input and Qwen2 for language processing, making it highly efficient in multi-modal reasoning, including visual and video-based tasks.
|
|
|
31 |
🌋📹LLaVA-Video has larger variants of [32B](https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-32B-Qwen) and [72B](https://huggingface.co/lmms-lab/LLaVA-Video-72B-Qwen2) and with a [variant](https://huggingface.co/lmms-lab/LLaVA-Video-7B-Qwen2-Video-Only) only trained on the new synthetic data
|
|
|
|
|
|
|
32 |
For further details, please visit the [Project Page](https://github.com/LLaVA-VL/LLaVA-NeXT) or check out the corresponding [research paper](https://arxiv.org/abs/2410.02713).
|
33 |
"""
|
34 |
description2 ="""- **Architecture**: `LlavaQwenForCausalLM`
|
@@ -46,7 +50,7 @@ description2 ="""- **Architecture**: `LlavaQwenForCausalLM`
|
|
46 |
- **Hardware Used for Training**: 256 * Nvidia Tesla A100 GPUs
|
47 |
"""
|
48 |
|
49 |
-
|
50 |
## Join us :
|
51 |
🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP) On 🤗Huggingface:[MultiTransformer](https://huggingface.co/MultiTransformer) On 🌐Github: [Tonic-AI](https://github.com/tonic-ai) & contribute to🌟 [Build Tonic](https://git.tonic-ai.com/contribute)🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗
|
52 |
"""
|
@@ -119,9 +123,14 @@ def gradio_interface(video_file, question):
|
|
119 |
return response
|
120 |
|
121 |
with gr.Blocks() as demo:
|
122 |
-
gr.Markdown(
|
123 |
-
gr.
|
124 |
-
|
|
|
|
|
|
|
|
|
|
|
125 |
with gr.Row():
|
126 |
with gr.Column():
|
127 |
video_input = gr.Video()
|
|
|
28 |
title = "# 🙋🏻♂️Welcome to 🌟Tonic's 🌋📹LLaVA-Video!"
|
29 |
description1 ="""The **🌋📹LLaVA-Video-7B-Qwen2** is a 7B parameter model trained on the 🌋📹LLaVA-Video-178K dataset and the LLaVA-OneVision dataset. It is [based on the **Qwen2 language model**](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f), supporting a context window of up to 32K tokens. The model can process and interact with images, multi-images, and videos, with specific optimizations for video analysis.
|
30 |
This model leverages the **SO400M vision backbone** for visual input and Qwen2 for language processing, making it highly efficient in multi-modal reasoning, including visual and video-based tasks.
|
31 |
+
<<<<<<< HEAD
|
32 |
🌋📹LLaVA-Video has larger variants of [32B](https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-32B-Qwen) and [72B](https://huggingface.co/lmms-lab/LLaVA-Video-72B-Qwen2) and with a [variant](https://huggingface.co/lmms-lab/LLaVA-Video-7B-Qwen2-Video-Only) only trained on the new synthetic data
|
33 |
+
=======
|
34 |
+
🌋📹LLaVA-Video has larger variants of [32B](https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-32B-Qwen) and [72B](https://huggingface.co/lmms-lab/LLaVA-Video-72B-Qwen2) and with a [variant](https://huggingface.co/lmms-lab/LLaVA-Video-7B-Qwen2-Video-Only only trained on the new synthetic data
|
35 |
+
>>>>>>> b297085e69127cc23e50f4a85219c4057e5ce2c5
|
36 |
For further details, please visit the [Project Page](https://github.com/LLaVA-VL/LLaVA-NeXT) or check out the corresponding [research paper](https://arxiv.org/abs/2410.02713).
|
37 |
"""
|
38 |
description2 ="""- **Architecture**: `LlavaQwenForCausalLM`
|
|
|
50 |
- **Hardware Used for Training**: 256 * Nvidia Tesla A100 GPUs
|
51 |
"""
|
52 |
|
53 |
+
join_us = """
|
54 |
## Join us :
|
55 |
🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP) On 🤗Huggingface:[MultiTransformer](https://huggingface.co/MultiTransformer) On 🌐Github: [Tonic-AI](https://github.com/tonic-ai) & contribute to🌟 [Build Tonic](https://git.tonic-ai.com/contribute)🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗
|
56 |
"""
|
|
|
123 |
return response
|
124 |
|
125 |
with gr.Blocks() as demo:
|
126 |
+
gr.Markdown(title)
|
127 |
+
with gr.Row():
|
128 |
+
with gr.Group():
|
129 |
+
gr.Markdown(description1)
|
130 |
+
with gr.Group():
|
131 |
+
gr.Markdown(description2)
|
132 |
+
with gr.Accordion("Join Us", open=False):
|
133 |
+
gr.Markdown(join_us)
|
134 |
with gr.Row():
|
135 |
with gr.Column():
|
136 |
video_input = gr.Video()
|