Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
@@ -1,770 +1,11 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-

|
14 |
-
|
15 |
-
**🍄 Why Building this Project?**
|
16 |
-
|
17 |
-
The **core idea** behind this project is to **combine the strengths of different models in order to build a very powerful pipeline for solving complex problems**. And it's worth mentioning that this is a workflow for combining strong expert models, where **all parts can be used separately or in combination, and can be replaced with any similar but different models (like replacing Grounding DINO with GLIP or other detectors / replacing Stable-Diffusion with ControlNet or GLIGEN/ Combining with ChatGPT)**.
|
18 |
-
|
19 |
-
**🍇 Updates**
|
20 |
-
- **`2023/12/17`** Support [Grounded-RepViT-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-repvit-sam-demo) demo, thanks a lot for their great work!
|
21 |
-
- **`2023/12/16`** Support [Grounded-Edge-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-edge-sam-demo) demo, thanks a lot for their great work!
|
22 |
-
- **`2023/12/10`** Support [Grounded-Efficient-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-efficient-sam-demo) demo, thanks a lot for their great work!
|
23 |
-
- **`2023/11/24`** Release [RAM++](https://arxiv.org/abs/2310.15200), which is the next generation of RAM. RAM++ can recognize any category with high accuracy, including both predefined common categories and diverse open-set categories.
|
24 |
-
- **`2023/11/23`** Release our newly proposed visual prompt counting model [T-Rex](https://github.com/IDEA-Research/T-Rex). The introduction [Video](https://www.youtube.com/watch?v=engIEhZogAQ) and [Demo](https://deepdataspace.com/playground/ivp) is available in [DDS](https://github.com/IDEA-Research/deepdataspace) now.
|
25 |
-
- **`2023/07/25`** Support [Light-HQ-SAM](https://github.com/SysCV/sam-hq) in [EfficientSAM](./EfficientSAM/), credits to [Mingqiao Ye](https://github.com/ymq2017) and [Lei Ke](https://github.com/lkeab), thanks a lot for their great work!
|
26 |
-
- **`2023/07/14`** Combining **Grounding-DINO-B** with [SAM-HQ](https://github.com/SysCV/sam-hq) achieves **49.6 mean AP** in [Segmentation in the Wild](https://eval.ai/web/challenges/challenge-page/1931/overview) competition zero-shot track, surpassing Grounded-SAM by **3.6 mean AP**, thanks for their great work!
|
27 |
-
- **`2023/06/28`** Combining Grounding-DINO with Efficient SAM variants including [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM) in [EfficientSAM](./EfficientSAM/) for faster annotating, thanks a lot for their great work!
|
28 |
-
- **`2023/06/20`** By combining **Grounding-DINO-L** with **SAM-ViT-H**, Grounded-SAM achieves 46.0 mean AP in [Segmentation in the Wild](https://eval.ai/web/challenges/challenge-page/1931/overview) competition zero-shot track on [CVPR 2023 workshop](https://computer-vision-in-the-wild.github.io/cvpr-2023/), surpassing [UNINEXT (CVPR 2023)](https://github.com/MasterBin-IIAU/UNINEXT) by about **4 mean AP**.
|
29 |
-
- **`2023/06/16`** Release [RAM-Grounded-SAM Replicate Online Demo](https://replicate.com/cjwbw/ram-grounded-sam). Thanks a lot to [Chenxi](https://chenxwh.github.io/) for providing this nice demo 🌹.
|
30 |
-
- **`2023/06/14`** Support [RAM-Grounded-SAM & SAM-HQ](./automatic_label_ram_demo.py) and update [Simple Automatic Label Demo](./automatic_label_ram_demo.py) to support [RAM](https://github.com/OPPOMKLab/recognize-anything), setting up a strong automatic annotation pipeline.
|
31 |
-
- **`2023/06/13`** Checkout the [Autodistill: Train YOLOv8 with ZERO Annotations](https://youtu.be/gKTYMfwPo4M) tutorial to learn how to use Grounded-SAM + [Autodistill](https://github.com/autodistill/autodistill) for automated data labeling and real-time model training.
|
32 |
-
- **`2023/06/13`** Support [SAM-HQ](https://github.com/SysCV/sam-hq) in [Grounded-SAM Demo](#running_man-grounded-sam-detect-and-segment-everything-with-text-prompt) for higher quality prediction.
|
33 |
-
- **`2023/06/12`** Support [RAM-Grounded-SAM](#label-grounded-sam-with-ram-or-tag2text-for-automatic-labeling) for strong automatic labeling pipeline! Thanks for [Recognize-Anything](https://github.com/OPPOMKLab/recognize-anything).
|
34 |
-
- **`2023/06/01`** Our Grounded-SAM has been accepted to present a **demo** at [ICCV 2023](https://iccv2023.thecvf.com/)! See you in Paris!
|
35 |
-
- **`2023/05/23`**: Support `Image-Referring-Segment`, `Audio-Referring-Segment` and `Text-Referring-Segment` in [ImageBind-SAM](./playground/ImageBind_SAM/).
|
36 |
-
- **`2023/05/03`**: Checkout the [Automated Dataset Annotation and Evaluation with GroundingDINO and SAM](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/automated-dataset-annotation-and-evaluation-with-grounding-dino-and-sam.ipynb) which is an amazing tutorial on automatic labeling! Thanks a lot for [Piotr Skalski](https://github.com/SkalskiP) and [Roboflow](https://github.com/roboflow/notebooks)!
|
37 |
-
|
38 |
-
|
39 |
-
## Table of Contents
|
40 |
-
- [Grounded-Segment-Anything](#grounded-segment-anything)
|
41 |
-
- [Preliminary Works](#preliminary-works)
|
42 |
-
- [Highlighted Projects](#highlighted-projects)
|
43 |
-
- [Installation](#installation)
|
44 |
-
- [Install with Docker](#install-with-docker)
|
45 |
-
- [Install locally](#install-without-docker)
|
46 |
-
- [Grounded-SAM Playground](#grounded-sam-playground)
|
47 |
-
- [Step-by-Step Notebook Demo](#open_book-step-by-step-notebook-demo)
|
48 |
-
- [GroundingDINO: Detect Everything with Text Prompt](#running_man-groundingdino-detect-everything-with-text-prompt)
|
49 |
-
- [Grounded-SAM: Detect and Segment Everything with Text Prompt](#running_man-grounded-sam-detect-and-segment-everything-with-text-prompt)
|
50 |
-
- [Grounded-SAM with Inpainting: Detect, Segment and Generate Everything with Text Prompt](#skier-grounded-sam-with-inpainting-detect-segment-and-generate-everything-with-text-prompt)
|
51 |
-
- [Grounded-SAM and Inpaint Gradio APP](#golfing-grounded-sam-and-inpaint-gradio-app)
|
52 |
-
- [Grounded-SAM with RAM or Tag2Text for Automatic Labeling](#label-grounded-sam-with-ram-or-tag2text-for-automatic-labeling)
|
53 |
-
- [Grounded-SAM with BLIP & ChatGPT for Automatic Labeling](#robot-grounded-sam-with-blip-for-automatic-labeling)
|
54 |
-
- [Grounded-SAM with Whisper: Detect and Segment Anything with Audio](#open_mouth-grounded-sam-with-whisper-detect-and-segment-anything-with-audio)
|
55 |
-
- [Grounded-SAM ChatBot with Visual ChatGPT](#speech_balloon-grounded-sam-chatbot-demo)
|
56 |
-
- [Grounded-SAM with OSX for 3D Whole-Body Mesh Recovery](#man_dancing-run-grounded-segment-anything--osx-demo)
|
57 |
-
- [Grounded-SAM with VISAM for Tracking and Segment Anything](#man_dancing-run-grounded-segment-anything--visam-demo)
|
58 |
-
- [Interactive Fashion-Edit Playground: Click for Segmentation And Editing](#dancers-interactive-editing)
|
59 |
-
- [Interactive Human-face Editing Playground: Click And Editing Human Face](#dancers-interactive-editing)
|
60 |
-
- [3D Box Via Segment Anything](#camera-3d-box-via-segment-anything)
|
61 |
-
- [Playground: More Interesting and Imaginative Demos with Grounded-SAM](./playground/)
|
62 |
-
- [DeepFloyd: Image Generation with Text Prompt](./playground/DeepFloyd/)
|
63 |
-
- [PaintByExample: Exemplar-based Image Editing with Diffusion Models](./playground/PaintByExample/)
|
64 |
-
- [LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions](./playground/LaMa/)
|
65 |
-
- [RePaint: Inpainting using Denoising Diffusion Probabilistic Models](./playground/RePaint/)
|
66 |
-
- [ImageBind with SAM: Segment with Different Modalities](./playground/ImageBind_SAM/)
|
67 |
-
- [Efficient SAM Series for Faster Annotation](./EfficientSAM/)
|
68 |
-
- [Grounded-FastSAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-fastsam-demo)
|
69 |
-
- [Grounded-MobileSAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-mobilesam-demo)
|
70 |
-
- [Grounded-Light-HQSAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-light-hqsam-demo)
|
71 |
-
- [Grounded-Efficient-SAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-efficient-sam-demo)
|
72 |
-
- [Grounded-Edge-SAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-edge-sam-demo)
|
73 |
-
- [Grounded-RepViT-SAM Demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-repvit-sam-demo)
|
74 |
-
|
75 |
-
|
76 |
-
## Preliminary Works
|
77 |
-
|
78 |
-
Here we provide some background knowledge that you may need to know before trying the demos.
|
79 |
-
|
80 |
-
<div align="center">
|
81 |
-
|
82 |
-
| Title | Intro | Description | Links |
|
83 |
-
|:----:|:----:|:----:|:----:|
|
84 |
-
| [Segment-Anything](https://arxiv.org/abs/2304.02643) |  | A strong foundation model aims to segment everything in an image, which needs prompts (as boxes/points/text) to generate masks | [[Github](https://github.com/facebookresearch/segment-anything)] <br> [[Page](https://segment-anything.com/)] <br> [[Demo](https://segment-anything.com/demo)] |
|
85 |
-
| [Grounding DINO](https://arxiv.org/abs/2303.05499) |  | A strong zero-shot detector which is capable of to generate high quality boxes and labels with free-form text. | [[Github](https://github.com/IDEA-Research/GroundingDINO)] <br> [[Demo](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo)] |
|
86 |
-
| [OSX](http://arxiv.org/abs/2303.16160) |  | A strong and efficient one-stage motion capture method to generate high quality 3D human mesh from monucular image. OSX also releases a large-scale upper-body dataset UBody for a more accurate reconstrution in the upper-body scene. | [[Github](https://github.com/IDEA-Research/OSX)] <br> [[Page](https://osx-ubody.github.io/)] <br> [[Video](https://osx-ubody.github.io/)] <br> [[Data](https://docs.google.com/forms/d/e/1FAIpQLSehgBP7wdn_XznGAM2AiJPiPLTqXXHw5uX9l7qeQ1Dh9HoO_A/viewform)] |
|
87 |
-
| [Stable-Diffusion](https://arxiv.org/abs/2112.10752) |  | A super powerful open-source latent text-to-image diffusion model | [[Github](https://github.com/CompVis/stable-diffusion)] <br> [[Page](https://ommer-lab.com/research/latent-diffusion-models/)] |
|
88 |
-
| [RAM++](https://arxiv.org/abs/2310.15200) |  | RAM++ is the next generation of RAM, which can recognize any category with high accuracy. | [[Github](https://github.com/OPPOMKLab/recognize-anything)] |
|
89 |
-
| [RAM](https://recognize-anything.github.io/) |  | RAM is an image tagging model, which can recognize any common category with high accuracy. | [[Github](https://github.com/OPPOMKLab/recognize-anything)] <br> [[Demo](https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text)] |
|
90 |
-
| [BLIP](https://arxiv.org/abs/2201.12086) |  | A wonderful language-vision model for image understanding. | [[GitHub](https://github.com/salesforce/LAVIS)] |
|
91 |
-
| [Visual ChatGPT](https://arxiv.org/abs/2303.04671) |  | A wonderful tool that connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. | [[Github](https://github.com/microsoft/TaskMatrix)] <br> [[Demo](https://huggingface.co/spaces/microsoft/visual_chatgpt)] |
|
92 |
-
| [Tag2Text](https://tag2text.github.io/) |  | An efficient and controllable vision-language model which can simultaneously output superior image captioning and image tagging. | [[Github](https://github.com/OPPOMKLab/recognize-anything)] <br> [[Demo](https://huggingface.co/spaces/xinyu1205/Tag2Text)] |
|
93 |
-
| [VoxelNeXt](https://arxiv.org/abs/2303.11301) |  | A clean, simple, and fully-sparse 3D object detector, which predicts objects directly upon sparse voxel features. | [[Github](https://github.com/dvlab-research/VoxelNeXt)]
|
94 |
-
|
95 |
-
</div>
|
96 |
-
|
97 |
-
## Highlighted Projects
|
98 |
-
|
99 |
-
Here we provide some impressive works you may find interesting:
|
100 |
-
|
101 |
-
<div align="center">
|
102 |
-
|
103 |
-
| Title | Description | Links |
|
104 |
-
|:---:|:---:|:---:|
|
105 |
-
| [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM) | A universal image segmentation model to enable segment and recognize anything at any desired granularity | [[Github](https://github.com/UX-Decoder/Semantic-SAM)] <br> [[Demo](https://github.com/UX-Decoder/Semantic-SAM)] |
|
106 |
-
| [SEEM: Segment Everything Everywhere All at Once](https://arxiv.org/pdf/2304.06718.pdf) | A powerful promptable segmentation model supports segmenting with various types of prompts (text, point, scribble, referring image, etc.) and any combination of prompts. | [[Github](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once)] <br> [[Demo](https://huggingface.co/spaces/xdecoder/SEEM)] |
|
107 |
-
| [OpenSeeD](https://arxiv.org/pdf/2303.08131.pdf) | A simple framework for open-vocabulary segmentation and detection which supports interactive segmentation with box input to generate mask | [[Github](https://github.com/IDEA-Research/OpenSeeD)] |
|
108 |
-
| [LLaVA](https://arxiv.org/abs/2304.08485) | Visual instruction tuning with GPT-4 | [[Github](https://github.com/haotian-liu/LLaVA)] <br> [[Page](https://llava-vl.github.io/)] <br> [[Demo](https://llava.hliu.cc/)] <br> [[Data](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)] <br> [[Model](https://huggingface.co/liuhaotian/LLaVA-13b-delta-v0)] |
|
109 |
-
| [GenSAM](https://arxiv.org/abs/2312.07374) | Relaxing the instance-specific manual prompt requirement in SAM through training-free test-time adaptation | [[Github](https://github.com/jyLin8100/GenSAM)] <br> [[Page](https://lwpyh.github.io/GenSAM/)] |
|
110 |
-
|
111 |
-
</div>
|
112 |
-
|
113 |
-
We also list some awesome segment-anything extension projects here you may find interesting:
|
114 |
-
- [Computer Vision in the Wild (CVinW) Readings](https://github.com/Computer-Vision-in-the-Wild/CVinW_Readings) for those who are interested in open-set tasks in computer vision.
|
115 |
-
- [Zero-Shot Anomaly Detection](https://github.com/caoyunkang/GroundedSAM-zero-shot-anomaly-detection) by Yunkang Cao
|
116 |
-
- [EditAnything: ControlNet + StableDiffusion based on the SAM segmentation mask](https://github.com/sail-sg/EditAnything) by Shanghua Gao and Pan Zhou
|
117 |
-
- [IEA: Image Editing Anything](https://github.com/feizc/IEA) by Zhengcong Fei
|
118 |
-
- [SAM-MMRorate: Combining Rotated Object Detector and SAM](https://github.com/Li-Qingyun/sam-mmrotate) by Qingyun Li and Xue Yang
|
119 |
-
- [Awesome-Anything](https://github.com/VainF/Awesome-Anything) by Gongfan Fang
|
120 |
-
- [Prompt-Segment-Anything](https://github.com/RockeyCoss/Prompt-Segment-Anything) by Rockey
|
121 |
-
- [WebUI for Segment-Anything and Grounded-SAM](https://github.com/continue-revolution/sd-webui-segment-anything) by Chengsong Zhang
|
122 |
-
- [Inpainting Anything: Inpaint Anything with SAM + Inpainting models](https://github.com/geekyutao/Inpaint-Anything) by Tao Yu
|
123 |
-
- [Grounded Segment Anything From Objects to Parts: Combining Segment-Anything with VLPart & GLIP & Visual ChatGPT](https://github.com/Cheems-Seminar/segment-anything-and-name-it) by Peize Sun and Shoufa Chen
|
124 |
-
- [Narapi-SAM: Integration of Segment Anything into Narapi (A nice viewer for SAM)](https://github.com/MIC-DKFZ/napari-sam) by MIC-DKFZ
|
125 |
-
- [Grounded Segment Anything Colab](https://github.com/camenduru/grounded-segment-anything-colab) by camenduru
|
126 |
-
- [Optical Character Recognition with Segment Anything](https://github.com/yeungchenwa/OCR-SAM) by Zhenhua Yang
|
127 |
-
- [Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet](https://github.com/showlab/Image2Paragraph) by showlab
|
128 |
-
- [Lang-Segment-Anything: Another awesome demo for combining GroundingDINO with Segment-Anything](https://github.com/luca-medeiros/lang-segment-anything) by Luca Medeiros
|
129 |
-
- [🥳 🚀 **Playground: Integrate SAM and OpenMMLab!**](https://github.com/open-mmlab/playground)
|
130 |
-
- [3D-object via Segment Anything](https://github.com/dvlab-research/3D-Box-Segment-Anything) by Yukang Chen
|
131 |
-
- [Image2Paragraph: Transform Image Into Unique Paragraph](https://github.com/showlab/Image2Paragraph) by Show Lab
|
132 |
-
- [Zero-shot Scene Graph Generate with Grounded-SAM](https://github.com/showlab/Image2Paragraph) by JackWhite-rwx
|
133 |
-
- [CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks](https://github.com/xmed-lab/CLIP_Surgery) by Eli-YiLi
|
134 |
-
- [Panoptic-Segment-Anything: Zero-shot panoptic segmentation using SAM](https://github.com/segments-ai/panoptic-segment-anything) by segments-ai
|
135 |
-
- [Caption-Anything: Generates Descriptive Captions for Any Object within an Image](https://github.com/ttengwang/Caption-Anything) by Teng Wang
|
136 |
-
- [Segment-Anything-3D: Transferring Segmentation Information of 2D Images to 3D Space](https://github.com/Pointcept/SegmentAnything3D) by Yunhan Yang
|
137 |
-
- [Expediting SAM without Fine-tuning](https://github.com/Expedit-LargeScale-Vision-Transformer/Expedit-SAM) by Weicong Liang and Yuhui Yuan
|
138 |
-
- [Semantic Segment Anything: Providing Rich Semantic Category Annotations for SAM](https://github.com/fudan-zvg/Semantic-Segment-Anything) by Jiaqi Chen and Zeyu Yang and Li Zhang
|
139 |
-
- [Enhance Everything: Combining SAM with Image Restoration and Enhancement Tasks](https://github.com/lixinustc/Enhance-Anything) by Xin Li
|
140 |
-
- [DragGAN](https://github.com/Zeqiang-Lai/DragGAN) by Shanghai AI Lab.
|
141 |
-
|
142 |
-
## Installation
|
143 |
-
The code requires `python>=3.8`, as well as `pytorch>=1.7` and `torchvision>=0.8`. Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
|
144 |
-
|
145 |
-
### Install with Docker
|
146 |
-
|
147 |
-
Open one terminal:
|
148 |
-
|
149 |
-
```
|
150 |
-
make build-image
|
151 |
-
```
|
152 |
-
|
153 |
-
```
|
154 |
-
make run
|
155 |
-
```
|
156 |
-
|
157 |
-
That's it.
|
158 |
-
|
159 |
-
If you would like to allow visualization across docker container, open another terminal and type:
|
160 |
-
|
161 |
-
```
|
162 |
-
xhost +
|
163 |
-
```
|
164 |
-
|
165 |
-
|
166 |
-
### Install without Docker
|
167 |
-
You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:
|
168 |
-
```bash
|
169 |
-
export AM_I_DOCKER=False
|
170 |
-
export BUILD_WITH_CUDA=True
|
171 |
-
export CUDA_HOME=/path/to/cuda-11.3/
|
172 |
-
```
|
173 |
-
|
174 |
-
Install Segment Anything:
|
175 |
-
|
176 |
-
```bash
|
177 |
-
python -m pip install -e segment_anything
|
178 |
-
```
|
179 |
-
|
180 |
-
Install Grounding DINO:
|
181 |
-
|
182 |
-
```bash
|
183 |
-
python -m pip install -e GroundingDINO
|
184 |
-
```
|
185 |
-
|
186 |
-
|
187 |
-
Install diffusers:
|
188 |
-
|
189 |
-
```bash
|
190 |
-
pip install --upgrade diffusers[torch]
|
191 |
-
```
|
192 |
-
|
193 |
-
Install osx:
|
194 |
-
|
195 |
-
```bash
|
196 |
-
git submodule update --init --recursive
|
197 |
-
cd grounded-sam-osx && bash install.sh
|
198 |
-
```
|
199 |
-
|
200 |
-
Install RAM & Tag2Text:
|
201 |
-
|
202 |
-
```bash
|
203 |
-
git clone https://github.com/xinyu1205/recognize-anything.git
|
204 |
-
pip install -r ./recognize-anything/requirements.txt
|
205 |
-
pip install -e ./recognize-anything/
|
206 |
-
```
|
207 |
-
|
208 |
-
The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. `jupyter` is also required to run the example notebooks.
|
209 |
-
|
210 |
-
```
|
211 |
-
pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel
|
212 |
-
```
|
213 |
-
|
214 |
-
More details can be found in [install segment anything](https://github.com/facebookresearch/segment-anything#installation) and [install GroundingDINO](https://github.com/IDEA-Research/GroundingDINO#install) and [install OSX](https://github.com/IDEA-Research/OSX)
|
215 |
-
|
216 |
-
|
217 |
-
## Grounded-SAM Playground
|
218 |
-
Let's start exploring our Grounding-SAM Playground and we will release more interesting demos in the future, stay tuned!
|
219 |
-
|
220 |
-
## :open_book: Step-by-Step Notebook Demo
|
221 |
-
Here we list some notebook demo provided in this project:
|
222 |
-
- [grounded_sam.ipynb](grounded_sam.ipynb)
|
223 |
-
- [grounded_sam_colab_demo.ipynb](grounded_sam_colab_demo.ipynb)
|
224 |
-
- [grounded_sam_3d_box.ipynb](grounded_sam_3d_box)
|
225 |
-
|
226 |
-
|
227 |
-
### :running_man: GroundingDINO: Detect Everything with Text Prompt
|
228 |
-
|
229 |
-
:grapes: [[arXiv Paper](https://arxiv.org/abs/2303.05499)] :rose:[[Try the Colab Demo](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-with-grounding-dino.ipynb)] :sunflower: [[Try Huggingface Demo](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo)] :mushroom: [[Automated Dataset Annotation and Evaluation](https://youtu.be/C4NqaRBz_Kw)]
|
230 |
-
|
231 |
-
Here's the step-by-step tutorial on running `GroundingDINO` demo:
|
232 |
-
|
233 |
-
**Step 1: Download the pretrained weights**
|
234 |
-
|
235 |
-
```bash
|
236 |
-
cd Grounded-Segment-Anything
|
237 |
-
|
238 |
-
# download the pretrained groundingdino-swin-tiny model
|
239 |
-
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
|
240 |
-
```
|
241 |
-
|
242 |
-
**Step 2: Running the demo**
|
243 |
-
|
244 |
-
```bash
|
245 |
-
python grounding_dino_demo.py
|
246 |
-
```
|
247 |
-
|
248 |
-
<details>
|
249 |
-
<summary> <b> Running with Python (same as demo but you can run it anywhere after installing GroundingDINO) </b> </summary>
|
250 |
-
|
251 |
-
```python
|
252 |
-
from groundingdino.util.inference import load_model, load_image, predict, annotate
|
253 |
-
import cv2
|
254 |
-
|
255 |
-
model = load_model("GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py", "./groundingdino_swint_ogc.pth")
|
256 |
-
IMAGE_PATH = "assets/demo1.jpg"
|
257 |
-
TEXT_PROMPT = "bear."
|
258 |
-
BOX_THRESHOLD = 0.35
|
259 |
-
TEXT_THRESHOLD = 0.25
|
260 |
-
|
261 |
-
image_source, image = load_image(IMAGE_PATH)
|
262 |
-
|
263 |
-
boxes, logits, phrases = predict(
|
264 |
-
model=model,
|
265 |
-
image=image,
|
266 |
-
caption=TEXT_PROMPT,
|
267 |
-
box_threshold=BOX_THRESHOLD,
|
268 |
-
text_threshold=TEXT_THRESHOLD
|
269 |
-
)
|
270 |
-
|
271 |
-
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
|
272 |
-
cv2.imwrite("annotated_image.jpg", annotated_frame)
|
273 |
-
```
|
274 |
-
|
275 |
-
</details>
|
276 |
-
<br>
|
277 |
-
|
278 |
-
**Tips**
|
279 |
-
- If you want to detect multiple objects in one sentence with [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO), we suggest separating each name with `.` . An example: `cat . dog . chair .`
|
280 |
-
|
281 |
-
**Step 3: Check the annotated image**
|
282 |
-
|
283 |
-
The annotated image will be saved as `./annotated_image.jpg`.
|
284 |
-
|
285 |
-
<div align="center">
|
286 |
-
|
287 |
-
| Text Prompt | Demo Image | Annotated Image |
|
288 |
-
|:----:|:----:|:----:|
|
289 |
-
| `Bear.` |  |  |
|
290 |
-
| `Horse. Clouds. Grasses. Sky. Hill` |  | 
|
291 |
-
|
292 |
-
</div>
|
293 |
-
|
294 |
-
|
295 |
-
### :running_man: Grounded-SAM: Detect and Segment Everything with Text Prompt
|
296 |
-
|
297 |
-
Here's the step-by-step tutorial on running `Grounded-SAM` demo:
|
298 |
-
|
299 |
-
**Step 1: Download the pretrained weights**
|
300 |
-
|
301 |
-
```bash
|
302 |
-
cd Grounded-Segment-Anything
|
303 |
-
|
304 |
-
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
|
305 |
-
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
|
306 |
-
```
|
307 |
-
|
308 |
-
We provide two versions of Grounded-SAM demo here:
|
309 |
-
- [grounded_sam_demo.py](./grounded_sam_demo.py): our original implementation for Grounded-SAM.
|
310 |
-
- [grounded_sam_simple_demo.py](./grounded_sam_simple_demo.py) our updated more elegant version for Grounded-SAM.
|
311 |
-
|
312 |
-
**Step 2: Running original grounded-sam demo**
|
313 |
-
|
314 |
-
```python
|
315 |
-
export CUDA_VISIBLE_DEVICES=0
|
316 |
-
python grounded_sam_demo.py \
|
317 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
318 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
319 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
320 |
-
--input_image assets/demo1.jpg \
|
321 |
-
--output_dir "outputs" \
|
322 |
-
--box_threshold 0.3 \
|
323 |
-
--text_threshold 0.25 \
|
324 |
-
--text_prompt "bear" \
|
325 |
-
--device "cuda"
|
326 |
-
```
|
327 |
-
|
328 |
-
The annotated results will be saved in `./outputs` as follows
|
329 |
-
|
330 |
-
<div align="center">
|
331 |
-
|
332 |
-
| Input Image | Annotated Image | Generated Mask |
|
333 |
-
|:----:|:----:|:----:|
|
334 |
-
|  |  |  |
|
335 |
-
|
336 |
-
</div>
|
337 |
-
|
338 |
-
**Step 3: Running grounded-sam demo with sam-hq**
|
339 |
-
- Download the demo image
|
340 |
-
```bash
|
341 |
-
wget https://github.com/IDEA-Research/detrex-storage/releases/download/grounded-sam-storage/sam_hq_demo_image.png
|
342 |
-
```
|
343 |
-
|
344 |
-
- Download SAM-HQ checkpoint [here](https://github.com/SysCV/sam-hq#model-checkpoints)
|
345 |
-
|
346 |
-
- Running grounded-sam-hq demo as follows:
|
347 |
-
```python
|
348 |
-
export CUDA_VISIBLE_DEVICES=0
|
349 |
-
python grounded_sam_demo.py \
|
350 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
351 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
352 |
-
--sam_hq_checkpoint ./sam_hq_vit_h.pth \ # path to sam-hq checkpoint
|
353 |
-
--use_sam_hq \ # set to use sam-hq model
|
354 |
-
--input_image sam_hq_demo_image.png \
|
355 |
-
--output_dir "outputs" \
|
356 |
-
--box_threshold 0.3 \
|
357 |
-
--text_threshold 0.25 \
|
358 |
-
--text_prompt "chair." \
|
359 |
-
--device "cuda"
|
360 |
-
```
|
361 |
-
|
362 |
-
The annotated results will be saved in `./outputs` as follows
|
363 |
-
|
364 |
-
<div align="center">
|
365 |
-
|
366 |
-
| Input Image | SAM Output | SAM-HQ Output |
|
367 |
-
|:----:|:----:|:----:|
|
368 |
-
|  |  |  |
|
369 |
-
|
370 |
-
</div>
|
371 |
-
|
372 |
-
**Step 4: Running the updated grounded-sam demo (optional)**
|
373 |
-
|
374 |
-
Note that this demo is almost same as the original demo, but **with more elegant code**.
|
375 |
-
|
376 |
-
```python
|
377 |
-
python grounded_sam_simple_demo.py
|
378 |
-
```
|
379 |
-
|
380 |
-
The annotated results will be saved as `./groundingdino_annotated_image.jpg` and `./grounded_sam_annotated_image.jpg`
|
381 |
-
|
382 |
-
<div align="center">
|
383 |
-
|
384 |
-
| Text Prompt | Input Image | GroundingDINO Annotated Image | Grounded-SAM Annotated Image |
|
385 |
-
|:----:|:----:|:----:|:----:|
|
386 |
-
| `The running dog` |  |  |  |
|
387 |
-
| `Horse. Clouds. Grasses. Sky. Hill` |  |  |  |
|
388 |
-
|
389 |
-
</div>
|
390 |
-
|
391 |
-
### :skier: Grounded-SAM with Inpainting: Detect, Segment and Generate Everything with Text Prompt
|
392 |
-
|
393 |
-
**Step 1: Download the pretrained weights**
|
394 |
-
|
395 |
-
```bash
|
396 |
-
cd Grounded-Segment-Anything
|
397 |
-
|
398 |
-
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
|
399 |
-
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
|
400 |
-
```
|
401 |
-
|
402 |
-
**Step 2: Running grounded-sam inpainting demo**
|
403 |
-
|
404 |
-
```bash
|
405 |
-
CUDA_VISIBLE_DEVICES=0
|
406 |
-
python grounded_sam_inpainting_demo.py \
|
407 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
408 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
409 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
410 |
-
--input_image assets/inpaint_demo.jpg \
|
411 |
-
--output_dir "outputs" \
|
412 |
-
--box_threshold 0.3 \
|
413 |
-
--text_threshold 0.25 \
|
414 |
-
--det_prompt "bench" \
|
415 |
-
--inpaint_prompt "A sofa, high quality, detailed" \
|
416 |
-
--device "cuda"
|
417 |
-
```
|
418 |
-
|
419 |
-
The annotated and inpaint image will be saved in `./outputs`
|
420 |
-
|
421 |
-
**Step 3: Check the results**
|
422 |
-
|
423 |
-
|
424 |
-
<div align="center">
|
425 |
-
|
426 |
-
| Input Image | Det Prompt | Annotated Image | Inpaint Prompt | Inpaint Image |
|
427 |
-
|:---:|:---:|:---:|:---:|:---:|
|
428 |
-
| | `Bench` |  | `A sofa, high quality, detailed` |  |
|
429 |
-
|
430 |
-
</div>
|
431 |
-
|
432 |
-
### :golfing: Grounded-SAM and Inpaint Gradio APP
|
433 |
-
|
434 |
-
We support 6 tasks in the local Gradio APP:
|
435 |
-
|
436 |
-
1. **scribble**: Segmentation is achieved through Segment Anything and mouse click interaction (you need to click on the object with the mouse, no need to specify the prompt).
|
437 |
-
2. **automask**: Segment the entire image at once through Segment Anything (no need to specify a prompt).
|
438 |
-
3. **det**: Realize detection through Grounding DINO and text interaction (text prompt needs to be specified).
|
439 |
-
4. **seg**: Realize text interaction by combining Grounding DINO and Segment Anything to realize detection + segmentation (need to specify text prompt).
|
440 |
-
5. **inpainting**: By combining Grounding DINO + Segment Anything + Stable Diffusion to achieve text exchange and replace the target object (need to specify text prompt and inpaint prompt) .
|
441 |
-
6. **automatic**: By combining BLIP + Grounding DINO + Segment Anything to achieve non-interactive detection + segmentation (no need to specify prompt).
|
442 |
-
|
443 |
-
```bash
|
444 |
-
python gradio_app.py
|
445 |
-
```
|
446 |
-
|
447 |
-
- The gradio_app visualization as follows:
|
448 |
-
|
449 |
-

|
450 |
-
|
451 |
-
|
452 |
-
### :label: Grounded-SAM with RAM or Tag2Text for Automatic Labeling
|
453 |
-
[**The Recognize Anything Models**](https://github.com/OPPOMKLab/recognize-anything) are a series of open-source and strong fundamental image recognition models, including [RAM++](https://arxiv.org/abs/2310.15200), [RAM](https://arxiv.org/abs/2306.03514) and [Tag2text](https://arxiv.org/abs/2303.05657).
|
454 |
-
|
455 |
-
|
456 |
-
It is seamlessly linked to generate pseudo labels automatically as follows:
|
457 |
-
1. Use RAM/Tag2Text to generate tags.
|
458 |
-
2. Use Grounded-Segment-Anything to generate the boxes and masks.
|
459 |
-
|
460 |
-
|
461 |
-
**Step 1: Init submodule and download the pretrained checkpoint**
|
462 |
-
|
463 |
-
- Init submodule:
|
464 |
-
|
465 |
-
```bash
|
466 |
-
cd Grounded-Segment-Anything
|
467 |
-
git submodule init
|
468 |
-
git submodule update
|
469 |
-
```
|
470 |
-
|
471 |
-
- Download pretrained weights for `GroundingDINO`, `SAM` and `RAM/Tag2Text`:
|
472 |
-
|
473 |
-
```bash
|
474 |
-
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
|
475 |
-
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
|
476 |
-
|
477 |
-
|
478 |
-
wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth
|
479 |
-
wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/tag2text_swin_14m.pth
|
480 |
-
```
|
481 |
-
|
482 |
-
**Step 2: Running the demo with RAM**
|
483 |
-
```bash
|
484 |
-
export CUDA_VISIBLE_DEVICES=0
|
485 |
-
python automatic_label_ram_demo.py \
|
486 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
487 |
-
--ram_checkpoint ram_swin_large_14m.pth \
|
488 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
489 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
490 |
-
--input_image assets/demo9.jpg \
|
491 |
-
--output_dir "outputs" \
|
492 |
-
--box_threshold 0.25 \
|
493 |
-
--text_threshold 0.2 \
|
494 |
-
--iou_threshold 0.5 \
|
495 |
-
--device "cuda"
|
496 |
-
```
|
497 |
-
|
498 |
-
|
499 |
-
**Step 2: Or Running the demo with Tag2Text**
|
500 |
-
```bash
|
501 |
-
export CUDA_VISIBLE_DEVICES=0
|
502 |
-
python automatic_label_tag2text_demo.py \
|
503 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
504 |
-
--tag2text_checkpoint tag2text_swin_14m.pth \
|
505 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
506 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
507 |
-
--input_image assets/demo9.jpg \
|
508 |
-
--output_dir "outputs" \
|
509 |
-
--box_threshold 0.25 \
|
510 |
-
--text_threshold 0.2 \
|
511 |
-
--iou_threshold 0.5 \
|
512 |
-
--device "cuda"
|
513 |
-
```
|
514 |
-
|
515 |
-
- RAM++ significantly improves the open-set capability of RAM, for [RAM++ inference on unseen categoreis](https://github.com/xinyu1205/recognize-anything#ram-inference-on-unseen-categories-open-set).
|
516 |
-
- Tag2Text also provides powerful captioning capabilities, and the process with captions can refer to [BLIP](#robot-run-grounded-segment-anything--blip-demo).
|
517 |
-
- The pseudo labels and model prediction visualization will be saved in `output_dir` as follows (right figure):
|
518 |
-
|
519 |
-

|
520 |
-
|
521 |
-
|
522 |
-
### :robot: Grounded-SAM with BLIP for Automatic Labeling
|
523 |
-
It is easy to generate pseudo labels automatically as follows:
|
524 |
-
1. Use BLIP (or other caption models) to generate a caption.
|
525 |
-
2. Extract tags from the caption. We use ChatGPT to handle the potential complicated sentences.
|
526 |
-
3. Use Grounded-Segment-Anything to generate the boxes and masks.
|
527 |
-
|
528 |
-
- Run Demo
|
529 |
-
```bash
|
530 |
-
export OPENAI_API_KEY=your_openai_key
|
531 |
-
export OPENAI_API_BASE=https://closeai.deno.dev/v1
|
532 |
-
export CUDA_VISIBLE_DEVICES=0
|
533 |
-
python automatic_label_demo.py \
|
534 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
535 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
536 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
537 |
-
--input_image assets/demo3.jpg \
|
538 |
-
--output_dir "outputs" \
|
539 |
-
--openai_key $OPENAI_API_KEY \
|
540 |
-
--box_threshold 0.25 \
|
541 |
-
--text_threshold 0.2 \
|
542 |
-
--iou_threshold 0.5 \
|
543 |
-
--device "cuda"
|
544 |
-
```
|
545 |
-
|
546 |
-
- When you don't have a paid Account for ChatGPT is also possible to use NLTK instead. Just don't include the ```openai_key``` Parameter when starting the Demo.
|
547 |
-
- The Script will automatically download the necessary NLTK Data.
|
548 |
-
- The pseudo labels and model prediction visualization will be saved in `output_dir` as follows:
|
549 |
-
|
550 |
-

|
551 |
-
|
552 |
-
|
553 |
-
### :open_mouth: Grounded-SAM with Whisper: Detect and Segment Anything with Audio
|
554 |
-
Detect and segment anything with speech!
|
555 |
-
|
556 |
-

|
557 |
-
|
558 |
-
**Install Whisper**
|
559 |
-
```bash
|
560 |
-
pip install -U openai-whisper
|
561 |
-
```
|
562 |
-
See the [whisper official page](https://github.com/openai/whisper#setup) if you have other questions for the installation.
|
563 |
-
|
564 |
-
**Run Voice-to-Label Demo**
|
565 |
-
|
566 |
-
Optional: Download the demo audio file
|
567 |
-
|
568 |
-
```bash
|
569 |
-
wget https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/demo_audio.mp3
|
570 |
-
```
|
571 |
-
|
572 |
-
|
573 |
-
```bash
|
574 |
-
export CUDA_VISIBLE_DEVICES=0
|
575 |
-
python grounded_sam_whisper_demo.py \
|
576 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
577 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
578 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
579 |
-
--input_image assets/demo4.jpg \
|
580 |
-
--output_dir "outputs" \
|
581 |
-
--box_threshold 0.3 \
|
582 |
-
--text_threshold 0.25 \
|
583 |
-
--speech_file "demo_audio.mp3" \
|
584 |
-
--device "cuda"
|
585 |
-
```
|
586 |
-
|
587 |
-

|
588 |
-
|
589 |
-
**Run Voice-to-inpaint Demo**
|
590 |
-
|
591 |
-
You can enable chatgpt to help you automatically detect the object and inpainting order with `--enable_chatgpt`.
|
592 |
-
|
593 |
-
Or you can specify the object you want to inpaint [stored in `args.det_speech_file`] and the text you want to inpaint with [stored in `args.inpaint_speech_file`].
|
594 |
-
|
595 |
-
```bash
|
596 |
-
export OPENAI_API_KEY=your_openai_key
|
597 |
-
export OPENAI_API_BASE=https://closeai.deno.dev/v1
|
598 |
-
# Example: enable chatgpt
|
599 |
-
export CUDA_VISIBLE_DEVICES=0
|
600 |
-
python grounded_sam_whisper_inpainting_demo.py \
|
601 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
602 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
603 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
604 |
-
--input_image assets/inpaint_demo.jpg \
|
605 |
-
--output_dir "outputs" \
|
606 |
-
--box_threshold 0.3 \
|
607 |
-
--text_threshold 0.25 \
|
608 |
-
--prompt_speech_file assets/acoustics/prompt_speech_file.mp3 \
|
609 |
-
--enable_chatgpt \
|
610 |
-
--openai_key $OPENAI_API_KEY\
|
611 |
-
--device "cuda"
|
612 |
-
```
|
613 |
-
|
614 |
-
```bash
|
615 |
-
# Example: without chatgpt
|
616 |
-
export CUDA_VISIBLE_DEVICES=0
|
617 |
-
python grounded_sam_whisper_inpainting_demo.py \
|
618 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
619 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
620 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
621 |
-
--input_image assets/inpaint_demo.jpg \
|
622 |
-
--output_dir "outputs" \
|
623 |
-
--box_threshold 0.3 \
|
624 |
-
--text_threshold 0.25 \
|
625 |
-
--det_speech_file "assets/acoustics/det_voice.mp3" \
|
626 |
-
--inpaint_speech_file "assets/acoustics/inpaint_voice.mp3" \
|
627 |
-
--device "cuda"
|
628 |
-
```
|
629 |
-
|
630 |
-

|
631 |
-
|
632 |
-
### :speech_balloon: Grounded-SAM ChatBot Demo
|
633 |
-
|
634 |
-
https://user-images.githubusercontent.com/24236723/231955561-2ae4ec1a-c75f-4cc5-9b7b-517aa1432123.mp4
|
635 |
-
|
636 |
-
Following [Visual ChatGPT](https://github.com/microsoft/visual-chatgpt), we add a ChatBot for our project. Currently, it supports:
|
637 |
-
1. "Describe the image."
|
638 |
-
2. "Detect the dog (and the cat) in the image."
|
639 |
-
3. "Segment anything in the image."
|
640 |
-
4. "Segment the dog (and the cat) in the image."
|
641 |
-
5. "Help me label the image."
|
642 |
-
6. "Replace the dog with a cat in the image."
|
643 |
-
|
644 |
-
To use the ChatBot:
|
645 |
-
- Install whisper if you want to use audio as input.
|
646 |
-
- Set the default model setting in the tool `Grounded_dino_sam_inpainting`.
|
647 |
-
- Run Demo
|
648 |
-
```bash
|
649 |
-
export OPENAI_API_KEY=your_openai_key
|
650 |
-
export OPENAI_API_BASE=https://closeai.deno.dev/v1
|
651 |
-
export CUDA_VISIBLE_DEVICES=0
|
652 |
-
python chatbot.py
|
653 |
-
```
|
654 |
-
|
655 |
-
### :man_dancing: Run Grounded-Segment-Anything + OSX Demo
|
656 |
-
|
657 |
-
<p align="middle">
|
658 |
-
<img src="assets/osx/grouned_sam_osx_demo.gif">
|
659 |
-
<br>
|
660 |
-
</p>
|
661 |
-
|
662 |
-
|
663 |
-
- Download the checkpoint `osx_l_wo_decoder.pth.tar` from [here](https://drive.google.com/drive/folders/1x7MZbB6eAlrq5PKC9MaeIm4GqkBpokow?usp=share_link) for OSX:
|
664 |
-
- Download the human model files and place it into `grounded-sam-osx/utils/human_model_files` following the instruction of [OSX](https://github.com/IDEA-Research/OSX).
|
665 |
-
|
666 |
-
- Run Demo
|
667 |
-
|
668 |
-
```shell
|
669 |
-
export CUDA_VISIBLE_DEVICES=0
|
670 |
-
python grounded_sam_osx_demo.py \
|
671 |
-
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
|
672 |
-
--grounded_checkpoint groundingdino_swint_ogc.pth \
|
673 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
674 |
-
--osx_checkpoint osx_l_wo_decoder.pth.tar \
|
675 |
-
--input_image assets/osx/grounded_sam_osx_demo.png \
|
676 |
-
--output_dir "outputs" \
|
677 |
-
--box_threshold 0.3 \
|
678 |
-
--text_threshold 0.25 \
|
679 |
-
--text_prompt "humans, chairs" \
|
680 |
-
--device "cuda"
|
681 |
-
```
|
682 |
-
|
683 |
-
- The model prediction visualization will be saved in `output_dir` as follows:
|
684 |
-
|
685 |
-
<img src="assets/osx/grounded_sam_osx_output.jpg" style="zoom: 49%;" />
|
686 |
-
|
687 |
-
- We also support promptable 3D whole-body mesh recovery. For example, you can track someone with a text prompt and estimate his 3D pose and shape :
|
688 |
-
|
689 |
-
|  |
|
690 |
-
| :---------------------------------------------------: |
|
691 |
-
| *A person with pink clothes* |
|
692 |
-
|
693 |
-
|  |
|
694 |
-
| :---------------------------------------------------: |
|
695 |
-
| *A man with a sunglasses* |
|
696 |
-
|
697 |
-
|
698 |
-
## :man_dancing: Run Grounded-Segment-Anything + VISAM Demo
|
699 |
-
|
700 |
-
- Download the checkpoint `motrv2_dancetrack.pth` from [here](https://drive.google.com/file/d/1EA4lndu2yQcVgBKR09KfMe5efbf631Th/view?usp=share_link) for MOTRv2:
|
701 |
-
- See the more thing if you have other questions for the installation.
|
702 |
-
|
703 |
-
- Run Demo
|
704 |
-
|
705 |
-
```shell
|
706 |
-
export CUDA_VISIBLE_DEVICES=0
|
707 |
-
python grounded_sam_visam.py \
|
708 |
-
--meta_arch motr \
|
709 |
-
--dataset_file e2e_dance \
|
710 |
-
--with_box_refine \
|
711 |
-
--query_interaction_layer QIMv2 \
|
712 |
-
--num_queries 10 \
|
713 |
-
--det_db det_db_motrv2.json \
|
714 |
-
--use_checkpoint \
|
715 |
-
--mot_path your_data_path \
|
716 |
-
--resume motrv2_dancetrack.pth \
|
717 |
-
--sam_checkpoint sam_vit_h_4b8939.pth \
|
718 |
-
--video_path DanceTrack/test/dancetrack0003
|
719 |
-
```
|
720 |
-
||
|
721 |
-
|
722 |
-
|
723 |
-
### :dancers: Interactive Editing
|
724 |
-
- Release the interactive fashion-edit playground in [here](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/humanFace). Run in the notebook, just click for annotating points for further segmentation. Enjoy it!
|
725 |
-
|
726 |
-
|
727 |
-
- Release human-face-edit branch [here](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/humanFace). We'll keep updating this branch with more interesting features. Here are some examples:
|
728 |
-
|
729 |
-

|
730 |
-
|
731 |
-
## :camera: 3D-Box via Segment Anything
|
732 |
-
We extend the scope to 3D world by combining Segment Anything and [VoxelNeXt](https://github.com/dvlab-research/VoxelNeXt). When we provide a prompt (e.g., a point / box), the result is not only 2D segmentation mask, but also 3D boxes. Please check [voxelnext_3d_box](./voxelnext_3d_box/) for more details.
|
733 |
-

|
734 |
-

|
735 |
-
|
736 |
-
|
737 |
-
|
738 |
-
|
739 |
-
## :cupid: Acknowledgements
|
740 |
-
|
741 |
-
- [Segment Anything](https://github.com/facebookresearch/segment-anything)
|
742 |
-
- [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO)
|
743 |
-
|
744 |
-
|
745 |
-
## Contributors
|
746 |
-
|
747 |
-
Our project wouldn't be possible without the contributions of these amazing people! Thank you all for making this project better.
|
748 |
-
|
749 |
-
<a href="https://github.com/IDEA-Research/Grounded-Segment-Anything/graphs/contributors">
|
750 |
-
<img src="https://contrib.rocks/image?repo=IDEA-Research/Grounded-Segment-Anything" />
|
751 |
-
</a>
|
752 |
-
|
753 |
-
|
754 |
-
## Citation
|
755 |
-
If you find this project helpful for your research, please consider citing the following BibTeX entry.
|
756 |
-
```BibTex
|
757 |
-
@article{kirillov2023segany,
|
758 |
-
title={Segment Anything},
|
759 |
-
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
|
760 |
-
journal={arXiv:2304.02643},
|
761 |
-
year={2023}
|
762 |
-
}
|
763 |
-
|
764 |
-
@article{liu2023grounding,
|
765 |
-
title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection},
|
766 |
-
author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
|
767 |
-
journal={arXiv preprint arXiv:2303.05499},
|
768 |
-
year={2023}
|
769 |
-
}
|
770 |
-
```
|
|
|
1 |
+
---
|
2 |
+
title: Grounding SAM Inpainting
|
3 |
+
emoji: 🐠
|
4 |
+
colorFrom: gray
|
5 |
+
colorTo: blue
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 4.10.0
|
8 |
+
app_file: grounded_sam_inpainting_demo.py
|
9 |
+
pinned: false
|
10 |
+
license: apache-2.0
|
11 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|