|
--- |
|
license: cc-by-nc-4.0 |
|
tags: |
|
- hyperbolic |
|
- clip |
|
- safeclip |
|
- vision-and-language |
|
- retrieval |
|
- safety |
|
- nsfw |
|
- responsible ai |
|
pipeline_tag: image-text-to-text |
|
library_name: pytorch |
|
--- |
|
|
|
# Model Card: HySAC |
|
|
|
Hyperbolic Safety-Aware CLIP (HySAC), introduced in the paper [**Hyperbolic Safety-Aware Vision-Language Models**](https://arxiv.org/abs/2503.12127), is a fine-tuned CLIP model that leverages the hierarchical properties of hyperbolic space to enhance safety in vision-language tasks. HySAC models the relationship between safe and unsafe image-text pairs, enabling effective retrieval of unsafe content and the ability to dynamically redirect unsafe queries to safer alternatives. |
|
|
|
## NSFW Definition |
|
In our work we use the [Safe-CLIP's definition of NSFW](https://arxiv.org/abs/2211.05105): a finite and fixed set concepts that are considered inappropriate, offensive, or harmful to individuals. These concepts are divided into seven categories: _hate, harassment, violence, self-harm, sexual, shocking and illegal activities_. |
|
|
|
#### Use HySAC |
|
The HySAC model can be loaded and used as shown below. Ensure you have installed the HySAC code from [our github repository](https://github.com/aimagelab/HySAC). |
|
|
|
```python |
|
>>> from hysac.models import HySAC |
|
|
|
>>> model_id = "aimagelab/hysac" |
|
>>> model = HySAC.from_pretrained(model_id, device="cuda").to("cuda") |
|
``` |
|
|
|
Standard methods `encode_image` and `encode_text` encode images and text. The `traverse_to_safe_image` and `traverse_to_safe_text` methods can be used to direct query embeddings towards safer alternatives. |
|
|
|
|
|
## Model Details |
|
|
|
HySAC is a fine-tuned version of the CLIP model, trained in hyperbolic space using the ViSU (Visual Safe and Unsafe) Dataset, introduced in [this paper](https://arxiv.org/abs/2311.16254). The text portion of the ViSU dataset is publicly available on HuggingFace as [ViSU-Text](https://huggingface.co/datasets/aimagelab/ViSU-Text). The image portion is not released due to the presence of potentially harmful content. |
|
|
|
**Model Release Date** 17 March 2025. |
|
|
|
For more information about the model, training details, dataset, and evaluation, please refer to the [paper](https://arxiv.org/abs/2503.12127). |
|
Additional details are available in the [official HySAC repository](https://github.com/aimagelab/HySAC). |
|
|
|
|
|
## Citation |
|
|
|
Please cite with the following BibTeX: |
|
``` |
|
@inproceedings{poppi2025hyperbolic, |
|
title={{Hyperbolic Safety-Aware Vision-Language Models}}, |
|
author={Poppi, Tobia and Kasarla, Tejaswi and Mettes, Pascal and Baraldi, Lorenzo and Cucchiara, Rita}, |
|
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
|
year={2025} |
|
} |
|
``` |