tttoaster commited on
Commit
f8c33e4
·
verified ·
1 Parent(s): 7f90dd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md CHANGED
@@ -3,3 +3,99 @@ license: other
3
  license_name: license-seed-x-17b
4
  license_link: LICENSE
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license_name: license-seed-x-17b
4
  license_link: LICENSE
5
  ---
6
+
7
+ # SEED-X
8
+ [![arXiv](https://img.shields.io/badge/arXiv-2404.14396-b31b1b.svg)](https://arxiv.org/abs/2404.14396)
9
+ [![Demo](https://img.shields.io/badge/Gradio-Demo-orange)](https://139a5c1d085953f17b.gradio.live/)
10
+ [![Static Badge](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/AILab-CVC/SEED-X-17B)
11
+
12
+ We introduce SEED-X, a unified and versatile foundation model, which can serve as various multimodal AI assistants **in the real world** after different instruction tuning, capable of responding to a variety of user needs through unifying **multi-granularity comprehension and generation**.
13
+
14
+ All models and inference code are released!
15
+
16
+ ## News
17
+ **2024-04-22** :hugs: We release the [models](https://huggingface.co/AILab-CVC/SEED-X-17B) including the pre-trained foundation model **SEED-X**, the general instruction-tuned model **SEED-X-I**, the editing model **SEED-X-Edit**, and our de-tokenier, which can generate realistic images from ViT features (w/o or w/ a condition image).
18
+
19
+ **2024-04-22** :hugs: We release an online [gradio demo](https://139a5c1d085953f17b.gradio.live/) of a general instruction-tuned model SEED-X-I. SEED-X-I can follow multimodal instruction (including images with dynamic resolutions) and make responses with images, texts and bounding boxes in multi-turn conversation. SEED-X-I **does not support image manipulation**. If you want to experience SEED-X-Edit for high-precision image editing, the inference code and model will be released soon.
20
+
21
+ ## TODOs
22
+ - [x] Release the multimodal foundation model SEED-X.
23
+ - [x] Release the instruction-tuned model SEED-X-Edit for high-precision image editing.
24
+ - [ ] Release 3.7M in-house image editing data.
25
+
26
+ ![image](https://github.com/AILab-CVC/SEED-X/blob/main/demos/teaser.jpg?raw=true)
27
+
28
+ ![image](https://github.com/AILab-CVC/SEED-X/blob/main/demos/case_example.jpg?raw=true)
29
+
30
+
31
+ ## Usage
32
+
33
+ ### Dependencies
34
+ - Python >= 3.8 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux))
35
+ - [PyTorch >=2.0.1](https://pytorch.org/)
36
+ - NVIDIA GPU + [CUDA](https://developer.nvidia.com/cuda-downloads)
37
+
38
+ ### Installation
39
+ Clone the repo and install dependent packages
40
+
41
+ ```bash
42
+ git clone https://github.com/AILab-CVC/SEED-X.git
43
+ cd SEED-X
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ ### Model Weights
48
+ We release the pretrained De-Tokenizer, the pre-trained foundation model **SEED-X**, the general instruction-tuned model **SEED-X-I**, the editing model **SEED-X-Edit** in in [SEED-X-17B Hugging Face](https://huggingface.co/AILab-CVC/SEED-X-17B).
49
+
50
+ Please download the checkpoints and save them under the folder `./pretrained`. For example, `./pretrained/seed_x`.
51
+
52
+ You also need to download [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat), and save them under the folder `./pretrained`. Please use the following script to extract the weights of visual encoder in Qwen-VL-Chat.
53
+ ```bash
54
+ python3 src/tools/reload_qwen_vit.py
55
+ ```
56
+ ### Inference with SEED-X De-tokenizer
57
+ ```bash
58
+ # For image reconstruction with ViT image features
59
+ python3 src/inference/eval_seed_x_detokenizer.py
60
+ # For image reconstruction with ViT image features and conditional image
61
+ python3 src/inference/eval_seed_x_detokenizer_with_condition.py
62
+ ```
63
+
64
+ ### Inference with pre-trained model SEED-X
65
+ ```bash
66
+ # For image comprehension and detection
67
+ python3 src/inference/eval_img2text_seed_x.py
68
+ # For image generation
69
+ python3 src/inference/eval_text2img_seed_x.py
70
+ ```
71
+
72
+ ### Inference with the general instruction-tuned model SEED-X-I
73
+ ```bash
74
+ # For image comprehension and detection
75
+ python3 src/inference/eval_img2text_seed_x_i.py
76
+ # For image generation
77
+ python3 src/inference/eval_text2img_seed_x_i.py
78
+ ```
79
+
80
+ ### Inference with the editing model SEED-X-Edit
81
+ ```bash
82
+ # For image editing
83
+ python3 src/inference/eval_img2edit_seed_x_edit.py
84
+ ```
85
+
86
+ ## Citation
87
+ If you find the work helpful, please consider citing:
88
+ ```bash
89
+ @article{ge2024seed,
90
+ title={SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation},
91
+ author={Ge, Yuying and Zhao, Sijie and Zhu, Jinguo and Ge, Yixiao and Yi, Kun and Song, Lin and Li, Chen and Ding, Xiaohan and Shan, Ying},
92
+ journal={arXiv preprint arXiv:2404.14396},
93
+ year={2024}
94
+ }
95
+ ```
96
+
97
+
98
+ ## License
99
+ `SEED` is licensed under the Apache License Version 2.0 except for the third-party components listed in [License](License_Seed-X.txt).
100
+
101
+ During training SEED-X, we freeze the original parameters of LLaMA2 and optimize the LoRA module.