csuhan commited on
Commit
225975e
1 Parent(s): f370dfe

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +0 -88
  2. app.py +36 -5
README.md CHANGED
@@ -10,91 +10,3 @@ pinned: false
10
  ---
11
 
12
  # OneLLM: One Framework to Align All Modalities with Language
13
-
14
- [[Project Page](https://onellm.csuhan.com)] [[Paper](#)] [[Web Demo](https://huggingface.co/spaces/csuhan/OneLLM)]
15
-
16
- Authors: [Jiaming Han](), [Kaixiong Gong](), [Yiyuan Zhang](), [Jiaqi Wang](), [Kaipeng Zhang](), [Dahua Lin](), [Yu Qiao](), [Peng Gao](), [Xiangyu Yue]().
17
-
18
- ## News
19
-
20
- - **2023.12.01** Release model weights and inference code.
21
-
22
- ## Contents
23
-
24
- - [Install](#install)
25
- - [Models](#models)
26
- - [Demo](#demo)
27
-
28
- <!-- - [Evaluation](#evaluation) -->
29
-
30
- <!-- - [Training](#training) -->
31
-
32
- ### TODO
33
-
34
- - [ ] Data
35
- - [ ] Evaluation
36
- - [ ] Training
37
-
38
- ### Install
39
-
40
- 1. Clone the repo into a local folder.
41
-
42
- ```bash
43
- git clone https://github.com/csuhan/OneLLM
44
-
45
- cd OneLLM
46
- ```
47
-
48
- 2. Install packages.
49
-
50
- ```bash
51
- conda create -n onellm python=3.9 -y
52
- conda activate onellm
53
-
54
- pip install -r requirements.txt
55
-
56
- # install pointnet
57
- cd lib/pointnet2
58
- python setup.py install
59
- ```
60
-
61
- 3. Install Apex. (Optional)
62
-
63
- ```bash
64
- git clone https://github.com/NVIDIA/apex
65
- cd apex
66
- pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
67
- ```
68
-
69
- ### Models
70
-
71
- We provide a preview model at: [csuhan/OneLLM-7B](https://huggingface.co/csuhan/OneLLM-7B).
72
-
73
- ### Demo
74
-
75
- **Huggingface Demo:** [csuhan/OneLLM](https://huggingface.co/spaces/csuhan/OneLLM).
76
-
77
- **Local Demo:** Assume you have downloaded the weights to ${WEIGHTS_DIR}. Then run the following command to start a gradio demo locally.
78
-
79
- ```bash
80
- python demos/multi_turn_mm.py --gpu_ids 0 --tokenizer_path config/llama2/tokenizer.model --llama_config config/llama2/7B.json --pretrained_path ${WEIGHTS_DIR}/consolidated.00-of-01.pth
81
- ```
82
-
83
- <!-- ### Evaluation -->
84
-
85
- <!-- ### Training -->
86
-
87
- ## Citation
88
-
89
- ```
90
- @article{han2023onellm,
91
- title={OneLLM: One Framework to Align All Modalities with Language},
92
- author={Han, Jiaming and Gong, Kaixiong and Zhang, Yiyuan and Wang, Jiaqi and Zhang, Kaipeng and Lin, Dahua and Qiao, Yu and Gao, Peng and Yue, Xiangyu},
93
- journal={arXiv preprint arXiv:xxxx},
94
- year={2023}
95
- }
96
- ```
97
-
98
- ## Acknowledgement
99
-
100
- [LLaMA](https://github.com/facebookresearch/llama), [LLaMA-Adapter](https://github.com/OpenGVLab/LLaMA-Adapter), [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), [Meta-Transformer](https://github.com/invictus717/MetaTransformer), [ChatBridge](https://github.com/joez17/ChatBridge)
 
10
  ---
11
 
12
  # OneLLM: One Framework to Align All Modalities with Language
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -183,20 +183,46 @@ def gradio_worker(
183
  chatbot = []
184
  msg = ""
185
  return chatbot, msg
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
 
187
  CSS ="""
188
  .contain { display: flex; flex-direction: column; }
189
  #component-0 { height: 100%; }
190
  #chatbot { flex-grow: 1; overflow: auto;}
191
  """
192
- with gr.Blocks(css=CSS) as demo:
 
193
  gr.Markdown("## OneLLM: One Framework to Align All Modalities with Language")
194
  with gr.Row(equal_height=True):
 
 
 
 
 
 
195
  with gr.Column(scale=1):
196
- img_path = gr.Image(label='Image Input', type='filepath')
197
- video_path = gr.Video(label='Video Input')
198
- audio_path = gr.Audio(label='Audio Input', type='filepath', sources=['upload'])
199
- modality = gr.Radio(choices=['image', 'audio', 'video'], value='image', interactive=True, label='Input Modalities')
 
 
200
 
201
  with gr.Column(scale=2):
202
  chatbot = gr.Chatbot(elem_id="chatbot")
@@ -220,6 +246,11 @@ def gradio_worker(
220
  minimum=0, maximum=1, value=0.75, interactive=True,
221
  label="Top-p",
222
  )
 
 
 
 
 
223
  msg.submit(
224
  show_user_input, [msg, chatbot], [msg, chatbot],
225
  ).then(
 
183
  chatbot = []
184
  msg = ""
185
  return chatbot, msg
186
+
187
+ def change_modality(inputs):
188
+ tab = inputs[0]
189
+ modality = 'image'
190
+ label_modal_dict = {
191
+ 'Image': 'image',
192
+ 'Video': 'video',
193
+ 'Audio': 'audio',
194
+ 'Point Cloud': 'point',
195
+ 'IMU': 'imu',
196
+ 'fMRI': 'fmri',
197
+ 'Depth Map': 'rgbd',
198
+ 'Normal Map': 'rgbn'
199
+ }
200
+ if tab.label in label_modal_dict:
201
+ modality = label_modal_dict[tab.label]
202
+ return modality
203
 
204
  CSS ="""
205
  .contain { display: flex; flex-direction: column; }
206
  #component-0 { height: 100%; }
207
  #chatbot { flex-grow: 1; overflow: auto;}
208
  """
209
+
210
+ with gr.Blocks(css=CSS, theme=gr.themes.Soft()) as demo:
211
  gr.Markdown("## OneLLM: One Framework to Align All Modalities with Language")
212
  with gr.Row(equal_height=True):
213
+ # with gr.Column(scale=1):
214
+ # img_path = gr.Image(label='Image Input', type='filepath')
215
+ # video_path = gr.Video(label='Video Input')
216
+ # audio_path = gr.Audio(label='Audio Input', type='filepath', sources=['upload'])
217
+ # modality = gr.Radio(choices=['image', 'audio', 'video'], value='image', interactive=True, label='Input Modalities', visible=False)
218
+ modality = gr.Textbox(value='image', visible=False)
219
  with gr.Column(scale=1):
220
+ with gr.Tab('Image') as img_tab:
221
+ img_path = gr.Image(label='Image Input', type='filepath')
222
+ with gr.Tab('Video') as video_tab:
223
+ video_path = gr.Video(label='Video Input')
224
+ with gr.Tab('Audio') as audio_tab:
225
+ audio_path = gr.Audio(label='Audio Input', type='filepath', sources=['upload'])
226
 
227
  with gr.Column(scale=2):
228
  chatbot = gr.Chatbot(elem_id="chatbot")
 
246
  minimum=0, maximum=1, value=0.75, interactive=True,
247
  label="Top-p",
248
  )
249
+
250
+ img_tab.select(change_modality, [img_tab], [modality])
251
+ video_tab.select(change_modality, [video_tab], [modality])
252
+ audio_tab.select(change_modality, [audio_tab], [modality])
253
+
254
  msg.submit(
255
  show_user_input, [msg, chatbot], [msg, chatbot],
256
  ).then(