MultiMatrix commited on
Commit
1317605
·
verified ·
1 Parent(s): 030d1dc

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +322 -0
README.md ADDED
@@ -0,0 +1,322 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <img src="assets/logo.png" width="400">
3
+ </p>
4
+
5
+ ## DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
6
+
7
+ [Paper](https://arxiv.org/abs/2308.15070) | [Project Page](https://0x3f3f3f3fun.github.io/projects/diffbir/)
8
+
9
+ ![visitors](https://visitor-badge.laobi.icu/badge?page_id=XPixelGroup/DiffBIR) [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/linxinqi/DiffBIR-official) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camenduru/DiffBIR-colab/blob/main/DiffBIR_colab.ipynb)
10
+
11
+ [Xinqi Lin](https://0x3f3f3f3fun.github.io/)<sup>1,\*</sup>, [Jingwen He](https://github.com/hejingwenhejingwen)<sup>2,3,\*</sup>, [Ziyan Chen](https://orcid.org/0000-0001-6277-5635)<sup>1</sup>, [Zhaoyang Lyu](https://scholar.google.com.tw/citations?user=gkXFhbwAAAAJ&hl=en)<sup>2</sup>, [Bo Dai](http://daibo.info/)<sup>2</sup>, [Fanghua Yu](https://github.com/Fanghua-Yu)<sup>1</sup>, [Wanli Ouyang](https://wlouyang.github.io/)<sup>2</sup>, [Yu Qiao](http://mmlab.siat.ac.cn/yuqiao)<sup>2</sup>, [Chao Dong](http://xpixel.group/2010/01/20/chaodong.html)<sup>1,2</sup>
12
+
13
+ <sup>1</sup>Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences<br><sup>2</sup>Shanghai AI Laboratory<br><sup>3</sup>The Chinese University of Hong Kong
14
+
15
+ <p align="center">
16
+ <img src="assets/teaser.png">
17
+ </p>
18
+
19
+ ---
20
+
21
+ <p align="center">
22
+ <img src="assets/pipeline.png">
23
+ </p>
24
+
25
+ :star:If DiffBIR is helpful for you, please help star this repo. Thanks!:hugs:
26
+
27
+ ## :book:Table Of Contents
28
+
29
+ - [Update](#update)
30
+ - [Visual Results On Real-world Images](#visual_results)
31
+ - [TODO](#todo)
32
+ - [Installation](#installation)
33
+ - [Pretrained Models](#pretrained_models)
34
+ - [Inference](#inference)
35
+ - [Train](#train)
36
+
37
+ ## <a name="update"></a>:new:Update
38
+
39
+ - **2024.04.08**: ✅ Release everything about our [updated manuscript](https://arxiv.org/abs/2308.15070), including (1) a **new model** trained on subset of laion2b-en and (2) a **more readable code base**, etc. DiffBIR is now a general restoration pipeline that could handle different blind image restoration tasks with a unified generation module.
40
+ - **2023.09.19**: ✅ Add support for Apple Silicon! Check [installation_xOS.md](assets/docs/installation_xOS.md) to work with **CPU/CUDA/MPS** device!
41
+ - **2023.09.14**: ✅ Integrate a patch-based sampling strategy ([mixture-of-diffusers](https://github.com/albarji/mixture-of-diffusers)). [**Try it!**](#patch-based-sampling) Here is an [example](https://imgsli.com/MjA2MDA1) with a resolution of 2396 x 1596. GPU memory usage will continue to be optimized in the future and we are looking forward to your pull requests!
42
+ - **2023.09.14**: ✅ Add support for background upsampler (DiffBIR/[RealESRGAN](https://github.com/xinntao/Real-ESRGAN)) in face enhancement! :rocket: [**Try it!**](#inference_fr)
43
+ - **2023.09.13**: :rocket: Provide online demo (DiffBIR-official) in [OpenXLab](https://openxlab.org.cn/apps/detail/linxinqi/DiffBIR-official), which integrates both general model and face model. Please have a try! [camenduru](https://github.com/camenduru) also implements an online demo, thanks for his work.:hugs:
44
+ - **2023.09.12**: ✅ Upload inference code of latent image guidance and release [real47](inputs/real47) testset.
45
+ - **2023.09.08**: ✅ Add support for restoring unaligned faces.
46
+ - **2023.09.06**: :rocket: Update [colab demo](https://colab.research.google.com/github/camenduru/DiffBIR-colab/blob/main/DiffBIR_colab.ipynb). Thanks to [camenduru](https://github.com/camenduru)!:hugs:
47
+ - **2023.08.30**: This repo is released.
48
+
49
+ ## <a name="visual_results"></a>:eyes:Visual Results On Real-world Images
50
+
51
+ ### Blind Image Super-Resolution
52
+
53
+ [<img src="assets/visual_results/bsr6.png" height="223px"/>](https://imgsli.com/MTk5ODI3) [<img src="assets/visual_results/bsr7.png" height="223px"/>](https://imgsli.com/MTk5ODI4) [<img src="assets/visual_results/bsr4.png" height="223px"/>](https://imgsli.com/MTk5ODI1)
54
+
55
+ <!-- [<img src="assets/visual_results/bsr1.png" height="223px"/>](https://imgsli.com/MTk5ODIy) [<img src="assets/visual_results/bsr2.png" height="223px"/>](https://imgsli.com/MTk5ODIz)
56
+
57
+ [<img src="assets/visual_results/bsr3.png" height="223px"/>](https://imgsli.com/MTk5ODI0) [<img src="assets/visual_results/bsr5.png" height="223px"/>](https://imgsli.com/MjAxMjM0) -->
58
+
59
+ <!-- [<img src="assets/visual_results/bsr1.png" height="223px"/>](https://imgsli.com/MTk5ODIy) [<img src="assets/visual_results/bsr5.png" height="223px"/>](https://imgsli.com/MjAxMjM0) -->
60
+
61
+ ### Blind Face Restoration
62
+
63
+ <!-- [<img src="assets/visual_results/bfr1.png" height="223px"/>](https://imgsli.com/MTk5ODI5) [<img src="assets/visual_results/bfr2.png" height="223px"/>](https://imgsli.com/MTk5ODMw) [<img src="assets/visual_results/bfr4.png" height="223px"/>](https://imgsli.com/MTk5ODM0) -->
64
+
65
+ [<img src="assets/visual_results/whole_image1.png" height="370"/>](https://imgsli.com/MjA2MTU0)
66
+ [<img src="assets/visual_results/whole_image2.png" height="370"/>](https://imgsli.com/MjA2MTQ4)
67
+
68
+ :star: Face and the background enhanced by DiffBIR.
69
+
70
+ ### Blind Image Denoising
71
+
72
+ [<img src="assets/visual_results/bid1.png" height="215px"/>](https://imgsli.com/MjUzNzkz) [<img src="assets/visual_results/bid3.png" height="215px"/>](https://imgsli.com/MjUzNzky)
73
+ [<img src="assets/visual_results/bid2.png" height="215px"/>](https://imgsli.com/MjUzNzkx)
74
+
75
+ ### 8x Blind Super-Resolution With Patch-based Sampling
76
+
77
+ > I often think of Bag End. I miss my books and my arm chair, and my garden. See, that's where I belong. That's home. --- Bilbo Baggins
78
+
79
+ [<img src="assets/visual_results/tiled_sampling.png" height="480px"/>](https://imgsli.com/MjUzODE4)
80
+
81
+ ## <a name="todo"></a>:climbing:TODO
82
+
83
+ - [x] Release code and pretrained models :computer:.
84
+ - [x] Update links to paper and project page :link:.
85
+ - [x] Release real47 testset :minidisc:.
86
+ - [ ] Provide webui.
87
+ - [ ] Reduce the vram usage of DiffBIR :fire::fire::fire:.
88
+ - [ ] Provide HuggingFace demo :notebook:.
89
+ - [x] Add a patch-based sampling schedule :mag:.
90
+ - [x] Upload inference code of latent image guidance :page_facing_up:.
91
+ - [ ] Improve the performance :superhero:.
92
+ - [x] Support MPS acceleration for MacOS users.
93
+ - [ ] DiffBIR-turbo :fire::fire::fire:.
94
+ - [ ] Speed up inference, such as using fp16/bf16, torch.compile :fire::fire::fire:.
95
+
96
+ ## <a name="installation"></a>:gear:Installation
97
+
98
+ ```shell
99
+ # clone this repo
100
+ git clone https://github.com/XPixelGroup/DiffBIR.git
101
+ cd DiffBIR
102
+
103
+ # create environment
104
+ conda create -n diffbir python=3.10
105
+ conda activate diffbir
106
+ pip install -r requirements.txt
107
+ ```
108
+
109
+ Our new code is based on pytorch 2.2.2 for the built-in support of memory-efficient attention. If you are working on a GPU that is not compatible with the latest pytorch, just downgrade pytorch to 1.13.1+cu116 and install xformers 0.0.16 as an alternative.
110
+ <!-- Note the installation is only compatible with **Linux** users. If you are working on different platforms, please check [xOS Installation](assets/docs/installation_xOS.md). -->
111
+
112
+ ## <a name="pretrained_models"></a>:dna:Pretrained Models
113
+
114
+ Here we list pretrained weight of stage 2 model (IRControlNet) and our trained SwinIR, which was used for degradation removal during the training of stage 2 model.
115
+
116
+ | Model Name | Description | HuggingFace | BaiduNetdisk | OpenXLab |
117
+ | :---------: | :----------: | :----------: | :----------: | :----------: |
118
+ | v2.pth | IRControlNet trained on filtered laion2b-en | [download](https://huggingface.co/lxq007/DiffBIR-v2/resolve/main/v2.pth) | [download](https://pan.baidu.com/s/1uTAFl13xgGAzrnznAApyng?pwd=xiu3)<br>(pwd: xiu3) | [download](https://openxlab.org.cn/models/detail/linxinqi/DiffBIR/tree/main) |
119
+ | v1_general.pth | IRControlNet trained on ImageNet-1k | [download](https://huggingface.co/lxq007/DiffBIR-v2/resolve/main/v1_general.pth) | [download](https://pan.baidu.com/s/1PhXHAQSTOUX4Gy3MOc2t2Q?pwd=79n9)<br>(pwd: 79n9) | [download](https://openxlab.org.cn/models/detail/linxinqi/DiffBIR/tree/main) |
120
+ | v1_face.pth | IRControlNet trained on FFHQ | [download](https://huggingface.co/lxq007/DiffBIR-v2/resolve/main/v1_face.pth) | [download](https://pan.baidu.com/s/1kvM_SB1VbXjbipLxdzlI3Q?pwd=n7dx)<br>(pwd: n7dx) | [download](https://openxlab.org.cn/models/detail/linxinqi/DiffBIR/tree/main) |
121
+ | codeformer_swinir.ckpt | SwinIR trained on ImageNet-1k | [download](https://huggingface.co/lxq007/DiffBIR-v2/resolve/main/codeformer_swinir.ckpt) | [download](https://pan.baidu.com/s/176fARg2ySYtDgX2vQOeRbA?pwd=vfif)<br>(pwd: vfif) | [download](https://openxlab.org.cn/models/detail/linxinqi/DiffBIR/tree/main) |
122
+
123
+ During inference, we use off-the-shelf models from other papers as the stage 1 model: [BSRNet](https://github.com/cszn/BSRGAN) for BSR, [SwinIR-Face](https://github.com/zsyOAOA/DifFace) used in DifFace for BFR, and [SCUNet-PSNR](https://github.com/cszn/SCUNet) for BID, while the trained IRControlNet remains **unchanged** for all tasks. Please check [code](utils/inference.py) for more details. Thanks for their work!
124
+
125
+ <!-- ## <a name="quick_start"></a>:flight_departure:Quick Start
126
+
127
+ Download [general_full_v1.ckpt](https://huggingface.co/lxq007/DiffBIR/resolve/main/general_full_v1.ckpt) and [general_swinir_v1.ckpt](https://huggingface.co/lxq007/DiffBIR/resolve/main/general_swinir_v1.ckpt) to `weights/`, then run the following command to interact with the gradio website.
128
+
129
+ ```shell
130
+ python gradio_diffbir.py \
131
+ --ckpt weights/general_full_v1.ckpt \
132
+ --config configs/model/cldm.yaml \
133
+ --reload_swinir \
134
+ --swinir_ckpt weights/general_swinir_v1.ckpt \
135
+ --device cuda
136
+ ```
137
+
138
+ <div align="center">
139
+ <kbd><img src="assets/gradio.png"></img></kbd>
140
+ </div> -->
141
+
142
+ ## <a name="inference"></a>:crossed_swords:Inference
143
+
144
+ We provide some examples for inference, check [inference.py](inference.py) for more arguments. Pretrained weights will be **automatically downloaded**.
145
+
146
+ ### Blind Image Super-Resolution
147
+
148
+ ```shell
149
+ python -u inference.py \
150
+ --version v2 \
151
+ --task sr \
152
+ --upscale 4 \
153
+ --cfg_scale 4.0 \
154
+ --input inputs/demo/bsr \
155
+ --output results/demo_bsr \
156
+ --device cuda
157
+ ```
158
+
159
+ ### Blind Face Restoration
160
+ <a name="inference_fr"></a>
161
+
162
+ ```shell
163
+ # for aligned face inputs
164
+ python -u inference.py \
165
+ --version v2 \
166
+ --task fr \
167
+ --upscale 1 \
168
+ --cfg_scale 4.0 \
169
+ --input inputs/demo/bfr/aligned \
170
+ --output results/demo_bfr_aligned \
171
+ --device cuda
172
+ ```
173
+
174
+ ```shell
175
+ # for unaligned face inputs
176
+ python -u inference.py \
177
+ --version v2 \
178
+ --task fr_bg \
179
+ --upscale 2 \
180
+ --cfg_scale 4.0 \
181
+ --input inputs/demo/bfr/whole_img \
182
+ --output results/demo_bfr_unaligned \
183
+ --device cuda
184
+ ```
185
+
186
+ ### Blind Image Denoising
187
+
188
+ ```shell
189
+ python -u inference.py \
190
+ --version v2 \
191
+ --task dn \
192
+ --upscale 1 \
193
+ --cfg_scale 4.0 \
194
+ --input inputs/demo/bid \
195
+ --output results/demo_bid \
196
+ --device cuda
197
+ ```
198
+
199
+ ### Other options
200
+
201
+ #### Patch-based sampling
202
+ <a name="patch_based_sampling"></a>
203
+
204
+ Add the following arguments to enable patch-based sampling:
205
+
206
+ ```shell
207
+ [command...] --tiled --tile_size 512 --tile_stride 256
208
+ ```
209
+
210
+ Patch-based sampling supports super-resolution with a large scale factor. Our patch-based sampling is built upon [mixture-of-diffusers](https://github.com/albarji/mixture-of-diffusers). Thanks for their work!
211
+
212
+ #### Restoration Guidance
213
+
214
+ Restoration guidance is used to achieve a trade-off bwtween quality and fidelity. We default to closing it since we prefer quality rather than fidelity. Here is an example:
215
+
216
+ ```shell
217
+ python -u inference.py \
218
+ --version v2 \
219
+ --task sr \
220
+ --upscale 4 \
221
+ --cfg_scale 4.0 \
222
+ --input inputs/demo/bsr \
223
+ --guidance --g_loss w_mse --g_scale 0.5 --g_space rgb \
224
+ --output results/demo_bsr_wg \
225
+ --device cuda
226
+ ```
227
+
228
+ You will see that the results become more smooth.
229
+
230
+ #### Better Start Point For Sampling
231
+
232
+ Add the following argument to offer better start point for reverse sampling:
233
+
234
+ ```shell
235
+ [command...] --better_start
236
+ ```
237
+
238
+ This option prevents our model from generating noise in
239
+ image background.
240
+
241
+ ## <a name="train"></a>:stars:Train
242
+
243
+
244
+ ### Stage 1
245
+
246
+ First, we train a SwinIR, which will be used for degradation removal during the training of stage 2.
247
+
248
+ <a name="gen_file_list"></a>
249
+ 1. Generate file list of training set and validation set, a file list looks like:
250
+
251
+ ```txt
252
+ /path/to/image_1
253
+ /path/to/image_2
254
+ /path/to/image_3
255
+ ...
256
+ ```
257
+
258
+ You can write a simple python script or directly use shell command to produce file lists. Here is an example:
259
+
260
+ ```shell
261
+ # collect all iamge files in img_dir
262
+ find [img_dir] -type f > files.list
263
+ # shuffle collected files
264
+ shuf files.list > files_shuf.list
265
+ # pick train_size files in the front as training set
266
+ head -n [train_size] files_shuf.list > files_shuf_train.list
267
+ # pick remaining files as validation set
268
+ tail -n +[train_size + 1] files_shuf.list > files_shuf_val.list
269
+ ```
270
+
271
+ 2. Fill in the [training configuration file](configs/train/train_stage1.yaml) with appropriate values.
272
+
273
+ 3. Start training!
274
+
275
+ ```shell
276
+ accelerate launch train_stage1.py --config configs/train/train_stage1.yaml
277
+ ```
278
+
279
+ ### Stage 2
280
+
281
+ 1. Download pretrained [Stable Diffusion v2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) to provide generative capabilities. :bulb:: If you have ran the [inference script](inference.py), the SD v2.1 checkpoint can be found in [weights](weights).
282
+
283
+ ```shell
284
+ wget https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt --no-check-certificate
285
+ ```
286
+
287
+ 2. Generate file list as mentioned [above](#gen_file_list). Currently, the training script of stage 2 doesn't support validation set, so you only need to create training file list.
288
+
289
+ 3. Fill in the [training configuration file](configs/train/train_stage2.yaml) with appropriate values.
290
+
291
+ 4. Start training!
292
+
293
+ ```shell
294
+ accelerate launch train_stage2.py --config configs/train/train_stage2.yaml
295
+ ```
296
+
297
+ ## Citation
298
+
299
+ Please cite us if our work is useful for your research.
300
+
301
+ ```
302
+ @misc{lin2024diffbir,
303
+ title={DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior},
304
+ author={Xinqi Lin and Jingwen He and Ziyan Chen and Zhaoyang Lyu and Bo Dai and Fanghua Yu and Wanli Ouyang and Yu Qiao and Chao Dong},
305
+ year={2024},
306
+ eprint={2308.15070},
307
+ archivePrefix={arXiv},
308
+ primaryClass={cs.CV}
309
+ }
310
+ ```
311
+
312
+ ## License
313
+
314
+ This project is released under the [Apache 2.0 license](LICENSE).
315
+
316
+ ## Acknowledgement
317
+
318
+ This project is based on [ControlNet](https://github.com/lllyasviel/ControlNet) and [BasicSR](https://github.com/XPixelGroup/BasicSR). Thanks for their awesome work.
319
+
320
+ ## Contact
321
+
322
+ If you have any questions, please feel free to contact with me at [email protected].