Spaces:

csuhan
/

Tar

Runtime error

App Files Files Community

csuhan commited on Dec 2, 2023

Commit

225975e

1 Parent(s): f370dfe

Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +0 -88
app.py +36 -5

README.md CHANGED Viewed

@@ -10,91 +10,3 @@ pinned: false
 ---
 # OneLLM: One Framework to Align All Modalities with Language
-[[Project Page](https://onellm.csuhan.com)] [[Paper](#)] [[Web Demo](https://huggingface.co/spaces/csuhan/OneLLM)]
-Authors: [Jiaming Han](), [Kaixiong Gong](), [Yiyuan Zhang](), [Jiaqi Wang](), [Kaipeng Zhang](), [Dahua Lin](), [Yu Qiao](), [Peng Gao](), [Xiangyu Yue]().
-## News
-- **2023.12.01** Release model weights and inference code.
-## Contents
-- [Install](#install)
-- [Models](#models)
-- [Demo](#demo)
-<!-- - [Evaluation](#evaluation) -->
-<!-- - [Training](#training) -->
-### TODO
-- [ ] Data
-- [ ] Evaluation
-- [ ] Training
-### Install
-1. Clone the repo into a local folder.
-```bash
-git clone https://github.com/csuhan/OneLLM
-cd OneLLM
-```
-2. Install packages.
-```bash
-conda create -n onellm python=3.9 -y
-conda activate onellm
-pip install -r requirements.txt
-# install pointnet
-cd lib/pointnet2
-python setup.py install
-```
-3. Install Apex. (Optional)
-```bash
-git clone https://github.com/NVIDIA/apex
-cd apex
-pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
-```
-### Models
-We provide a preview model at: [csuhan/OneLLM-7B](https://huggingface.co/csuhan/OneLLM-7B).
-### Demo
-**Huggingface Demo:** [csuhan/OneLLM](https://huggingface.co/spaces/csuhan/OneLLM).
-**Local Demo:** Assume you have downloaded the weights to ${WEIGHTS_DIR}. Then run the following command to start a gradio demo locally.
-```bash
-python demos/multi_turn_mm.py --gpu_ids 0 --tokenizer_path config/llama2/tokenizer.model --llama_config config/llama2/7B.json --pretrained_path ${WEIGHTS_DIR}/consolidated.00-of-01.pth
-```
-<!-- ### Evaluation -->
-<!-- ### Training -->
-## Citation
-```
-@article{han2023onellm,
-  title={OneLLM: One Framework to Align All Modalities with Language},
-  author={Han, Jiaming and Gong, Kaixiong and Zhang, Yiyuan and Wang, Jiaqi and Zhang, Kaipeng and Lin, Dahua and Qiao, Yu and Gao, Peng and Yue, Xiangyu},
-  journal={arXiv preprint arXiv:xxxx},
-  year={2023}
-}
-```
-## Acknowledgement
-[LLaMA](https://github.com/facebookresearch/llama), [LLaMA-Adapter](https://github.com/OpenGVLab/LLaMA-Adapter), [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), [Meta-Transformer](https://github.com/invictus717/MetaTransformer), [ChatBridge](https://github.com/joez17/ChatBridge)


10	---
11
12	# OneLLM: One Framework to Align All Modalities with Language

app.py CHANGED Viewed

@@ -183,20 +183,46 @@ def gradio_worker(
         chatbot = []
         msg = ""
         return chatbot, msg
     CSS ="""
     .contain { display: flex; flex-direction: column; }
     #component-0 { height: 100%; }
     #chatbot { flex-grow: 1; overflow: auto;}
     """
-    with gr.Blocks(css=CSS) as demo:
         gr.Markdown("## OneLLM: One Framework to Align All Modalities with Language")
         with gr.Row(equal_height=True):
             with gr.Column(scale=1):
-                img_path = gr.Image(label='Image Input', type='filepath')
-                video_path = gr.Video(label='Video Input')
-                audio_path = gr.Audio(label='Audio Input', type='filepath', sources=['upload'])
-                modality = gr.Radio(choices=['image', 'audio', 'video'], value='image', interactive=True, label='Input Modalities')
             with gr.Column(scale=2):
                 chatbot = gr.Chatbot(elem_id="chatbot")
@@ -220,6 +246,11 @@ def gradio_worker(
                 minimum=0, maximum=1, value=0.75, interactive=True,
                 label="Top-p",
             )
         msg.submit(
             show_user_input, [msg, chatbot], [msg, chatbot],
         ).then(

         chatbot = []
         msg = ""
         return chatbot, msg
+    def change_modality(inputs):
+        tab = inputs[0]
+        modality = 'image'
+        label_modal_dict = {
+            'Image': 'image',
+            'Video': 'video',
+            'Audio': 'audio',
+            'Point Cloud': 'point',
+            'IMU': 'imu',
+            'fMRI': 'fmri',
+            'Depth Map': 'rgbd',
+            'Normal Map': 'rgbn'
+        }
+        if tab.label in label_modal_dict:
+            modality = label_modal_dict[tab.label]
+        return modality
     CSS ="""
     .contain { display: flex; flex-direction: column; }
     #component-0 { height: 100%; }
     #chatbot { flex-grow: 1; overflow: auto;}
     """
+    with gr.Blocks(css=CSS, theme=gr.themes.Soft()) as demo:
         gr.Markdown("## OneLLM: One Framework to Align All Modalities with Language")
         with gr.Row(equal_height=True):
+            # with gr.Column(scale=1):
+            #     img_path = gr.Image(label='Image Input', type='filepath')
+            #     video_path = gr.Video(label='Video Input')
+            #     audio_path = gr.Audio(label='Audio Input', type='filepath', sources=['upload'])
+            # modality = gr.Radio(choices=['image', 'audio', 'video'], value='image', interactive=True, label='Input Modalities', visible=False)
+            modality = gr.Textbox(value='image', visible=False)
             with gr.Column(scale=1):
+                with gr.Tab('Image') as img_tab:
+                    img_path = gr.Image(label='Image Input', type='filepath')
+                with gr.Tab('Video') as video_tab:
+                    video_path = gr.Video(label='Video Input')
+                with gr.Tab('Audio') as audio_tab:
+                    audio_path = gr.Audio(label='Audio Input', type='filepath', sources=['upload'])
             with gr.Column(scale=2):
                 chatbot = gr.Chatbot(elem_id="chatbot")
                 minimum=0, maximum=1, value=0.75, interactive=True,
                 label="Top-p",
             )
+        img_tab.select(change_modality, [img_tab], [modality])
+        video_tab.select(change_modality, [video_tab], [modality])
+        audio_tab.select(change_modality, [audio_tab], [modality])
         msg.submit(
             show_user_input, [msg, chatbot], [msg, chatbot],
         ).then(