--- license: cc-by-nc-4.0 tags: - hyperbolic - clip - safeclip - vision-and-language - retrieval - safety - nsfw - responsible ai pipeline_tag: image-text-to-text library_name: pytorch --- # Model Card: HySAC Hyperbolic Safety-Aware CLIP (HySAC), introduced in the paper [**Hyperbolic Safety-Aware Vision-Language Models**](https://arxiv.org/abs/2503.12127), is a fine-tuned CLIP model that leverages the hierarchical properties of hyperbolic space to enhance safety in vision-language tasks. HySAC models the relationship between safe and unsafe image-text pairs, enabling effective retrieval of unsafe content and the ability to dynamically redirect unsafe queries to safer alternatives. ## NSFW Definition In our work we use the [Safe-CLIP's definition of NSFW](https://arxiv.org/abs/2211.05105): a finite and fixed set concepts that are considered inappropriate, offensive, or harmful to individuals. These concepts are divided into seven categories: _hate, harassment, violence, self-harm, sexual, shocking and illegal activities_. #### Use HySAC The HySAC model can be loaded and used as shown below. Ensure you have installed the HySAC code from [our github repository](https://github.com/aimagelab/HySAC). ```python >>> from hysac.models import HySAC >>> model_id = "aimagelab/hysac" >>> model = HySAC.from_pretrained(model_id, device="cuda").to("cuda") ``` Standard methods `encode_image` and `encode_text` encode images and text. The `traverse_to_safe_image` and `traverse_to_safe_text` methods can be used to direct query embeddings towards safer alternatives. ## Model Details HySAC is a fine-tuned version of the CLIP model, trained in hyperbolic space using the ViSU (Visual Safe and Unsafe) Dataset, introduced in [this paper](https://arxiv.org/abs/2311.16254). The text portion of the ViSU dataset is publicly available on HuggingFace as [ViSU-Text](https://huggingface.co/datasets/aimagelab/ViSU-Text). The image portion is not released due to the presence of potentially harmful content. **Model Release Date** 17 March 2025. For more information about the model, training details, dataset, and evaluation, please refer to the [paper](https://arxiv.org/abs/2503.12127). Additional details are available in the [official HySAC repository](https://github.com/aimagelab/HySAC). ## Citation Please cite with the following BibTeX: ``` @inproceedings{poppi2025hyperbolic, title={{Hyperbolic Safety-Aware Vision-Language Models}}, author={Poppi, Tobia and Kasarla, Tejaswi and Mettes, Pascal and Baraldi, Lorenzo and Cucchiara, Rita}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2025} } ```