Large scale batch watermark removal / inpainting

#5
by jferments - opened

How would you apply this script for batch watermark removal? I want to use your model with something like Inpaint-Anything to batch remove watermarks for large #'s of images. Basically, instead of just detecting if there is a watermark, I need it to draw a bounding box / mask on the watermark and inpaint it out of the image. I can do this manually, but I have a dataset of 3+ million images that need to be de-watermarked for use as SDXL fine-tune training data.

Thanks for all of the great tools you've created btw - joycaption is working wonderfully for captioning these images!

jferments changed discussion title from Inpainting / Watermark removal to Large scale batch watermark removal / inpainting

The YOLO model will output a bounding box which could be used for that. You can see the space's code here for an example on how the model(s) are used: https://huggingface.co/spaces/fancyfeast/joycaption-watermark-detection/blob/main/app.py

Just be aware that the YOLO model has slightly worse accuracy than the classification model.

By the way, if the program you're using to finetune SDXL supports loss masking, that's a good alternative to inpainting. That allows you to specify a loss mask for an image; parts of the image to "ignore" during training. So effectively SDXL will never get trained to produce watermarks. It saves you from having to expend the resources inpainting all the images, and you don't have to worry about imperfect inpainting negatively effecting the finetuning.

Thanks! I hadn't heard of loss masking, and I'll have to play around with that as well for the images that slip through the watermark removal process.

But after getting frustrated trying to find a solution online, I actually ended up developing a solution to the batch watermark removal problem last night, thanks in large part to your YOLO watermark detection fine tune.

I wrote a command line utility that utilizes your model for watermark detection and then uses simple-lama to do the inpainting/removal. It works with multiple-GPUs and is reasonably fast. On my dual 4090 machine, I'm able to de-watermark over 1000 images per minute, with >99% detection rate, and zero corrupted images that I'm aware of.

Here is the Github for the watermark removal script: jferments/watermark_remover

Sign up or log in to comment