Spaces:
Sleeping
Sleeping
File size: 3,408 Bytes
2cadd70 6fa1ee9 2cadd70 6e9c433 ebccc50 6e9c433 c4e40c4 6e9c433 c4e40c4 6e9c433 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
title: S MultiMAE
emoji: π
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.33.0
app_file: streamlit_apps/app.py
pinned: false
---
# S-MultiMAE
This repository provides the official implementation of `S-MultiMAE A Multi-Ground Truth approach for RGB-D Saliency Detection`
_Nguyen Truong Thinh Huynh, Van Linh Pham, Xuan Toan Mai and Tuan Anh Tran_

## Model weights
| Backbone | #params | Training paradigm | Weights | Input size |
| -------- | ----------- | ----------------- | ---------------------------------------------------------------------------------------------- | ---------- |
| ViT-L | 328,318,529 | Multi-GT | [Download](https://drive.google.com/file/d/1YhAuu3DI2adPLQgbgoSt74ilZbpuKihh/view?usp=sharing) | 224x224 |
| ViT-B | 107,654,977 | Multi-GT | [Download](https://drive.google.com/file/d/13Omafif3pvPKgg3Isp_srkHf8CSPx33d/view?usp=sharing) | 224x224 |
## Demo on HuggingFace
- https://huggingface.co/spaces/RGBD-SOD/S-MultiMAE


## How to run locally
### Create a virtual environment
We recommend using python 3.10 or higher.
```bash
python3.10 -m venv env
source env/bin/activate
pip install -r requirements.txt
```
### Download trained weights
- Download model weights and put it in the folder `weights`. You may also need to download the weights of [DPT model](https://drive.google.com/file/d/1vU4G31_T2PJv1DkA8j-MLXfMjGa7kD3L/view?usp=sharing) (a rgb2depth model). The `weights` folder will look like this:
```bash
βββ weights
β βββ omnidata_rgb2depth_dpt_hybrid.pth
β βββ s-multimae-cfgv4_0_2006-top1.pth
β βββ s-multimae-cfgv4_0_2007-top1.pth
```
### Run
- Run streamlit app
```
streamlit run streamlit_apps/app.py --server.port 9113 --browser.gatherUsageStats False --server.fileWatcherType none
```
## Datasets
### COME15K dataset
| | 1 GT | 2 GTs | 3 GTs | 4 GTs | 5 GTs |
| --------------------- | ------ | ----- | ------ | ----- | ----- |
| COME8K (8025 samples) | 77.61% | 1.71% | 18.28% | 2.24% | 0.16% |
| COME-E (4600 samples) | 70.5% | 1.87% | 21.15% | 5.70% | 0.78% |
| COME8K (3000 samples) | 62.3% | 2.00% | 25.63% | 8.37% | 1.70% |
```
@inproceedings{cascaded_rgbd_sod,
title={RGB-D Saliency Detection via Cascaded Mutual Information Minimization},
author={Zhang, Jing and Fan, Deng-Ping and Dai, Yuchao and Yu, Xin and Zhong, Yiran and Barnes, Nick and Shao, Ling},
booktitle={International Conference on Computer Vision (ICCV)},
year={2021}
}
```
## Acknowledgements
S-MultiMAE is build on top of [MultiMAE](https://github.com/EPFL-VILAB/MultiMAE). We kindly thank the authors for releasing their code.
```bib
@article{bachmann2022multimae,
author = {Roman Bachmann and David Mizrahi and Andrei Atanov and Amir Zamir},
title = {{MultiMAE}: Multi-modal Multi-task Masked Autoencoders},
booktitle = {European Conference on Computer Vision},
year = {2022},
}
```
## References
All references are cited in these files:
- [Datasets](./docs/references/Dataset.bib)
- [SOTAs](./docs/references/SOTAs.bib)
- [Others](./docs/references/References.bib)
|