aimagelab
/

HySAC

Image-Text-to-Text

vision-and-language

nsfw

Model card Files Files and versions

HySAC / README.md

tobi1modna's picture

Replace library name (#2)

5166e4d verified 5 months ago

|

history blame contribute delete

2.69 kB

	---
	license: cc-by-nc-4.0
	tags:
	- hyperbolic
	- clip
	- safeclip
	- vision-and-language
	- retrieval
	- safety
	- nsfw
	- responsible ai
	pipeline_tag: image-text-to-text
	library_name: pytorch
	---

	# Model Card: HySAC

	Hyperbolic Safety-Aware CLIP (HySAC), introduced in the paper [Hyperbolic Safety-Aware Vision-Language Models](https://arxiv.org/abs/2503.12127), is a fine-tuned CLIP model that leverages the hierarchical properties of hyperbolic space to enhance safety in vision-language tasks. HySAC models the relationship between safe and unsafe image-text pairs, enabling effective retrieval of unsafe content and the ability to dynamically redirect unsafe queries to safer alternatives.

	## NSFW Definition
	In our work we use the [Safe-CLIP's definition of NSFW](https://arxiv.org/abs/2211.05105): a finite and fixed set concepts that are considered inappropriate, offensive, or harmful to individuals. These concepts are divided into seven categories: _hate, harassment, violence, self-harm, sexual, shocking and illegal activities_.

	#### Use HySAC
	The HySAC model can be loaded and used as shown below. Ensure you have installed the HySAC code from [our github repository](https://github.com/aimagelab/HySAC).

	```python
	>>> from hysac.models import HySAC

	>>> model_id = "aimagelab/hysac"
	>>> model = HySAC.from_pretrained(model_id, device="cuda").to("cuda")
	```

	Standard methods `encode_image` and `encode_text` encode images and text. The `traverse_to_safe_image` and `traverse_to_safe_text` methods can be used to direct query embeddings towards safer alternatives.


	## Model Details

	HySAC is a fine-tuned version of the CLIP model, trained in hyperbolic space using the ViSU (Visual Safe and Unsafe) Dataset, introduced in [this paper](https://arxiv.org/abs/2311.16254). The text portion of the ViSU dataset is publicly available on HuggingFace as [ViSU-Text](https://huggingface.co/datasets/aimagelab/ViSU-Text). The image portion is not released due to the presence of potentially harmful content.

	Model Release Date 17 March 2025.

	For more information about the model, training details, dataset, and evaluation, please refer to the [paper](https://arxiv.org/abs/2503.12127).
	Additional details are available in the [official HySAC repository](https://github.com/aimagelab/HySAC).


	## Citation

	Please cite with the following BibTeX:
	```
	@inproceedings{poppi2025hyperbolic,
	title={{Hyperbolic Safety-Aware Vision-Language Models}},
	author={Poppi, Tobia and Kasarla, Tejaswi and Mettes, Pascal and Baraldi, Lorenzo and Cucchiara, Rita},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year={2025}
	}
	```