Zero-Shot Image Classification
Transformers
Safetensors
clip
Inference Endpoints
CLIP-SAE-ViT-L-14 / README.md
zer0int's picture
HunyuanVideo use info
7adb7b9 verified
metadata
license: mit
datasets:
  - zer0int/CLIP-adversarial-typographic-attack_text-image
  - SPRIGHT-T2I/spright_coco
base_model:
  - openai/clip-vit-large-patch14
pipeline_tag: zero-shot-image-classification
library_name: transformers

CLIP ViT-L/14 finetune: SAE-informed adversarial training

image/png

  • Interesting things with adversarial robustness to try: Right-click and download individual images: Image 1 -- Image 2 -- Image 3 image/png
  • Upload each into zero-shot [hopefully available soon on the right here->]
  • Try labels (class names): a photo of a cat, a photo of a dog, a photo of a text
  • Repeat the same with e.g. my GmP models models and see what happens. =)
  • I'm really hoping the HF format .safetensors conversion didn't mess anything up (it happens!); just in case it did, or if there's no inference API available to use:
  • I put a script that will do the same thing (on the not-converted model) on my GitHub repo. Plus, you can just reproduce the fine-tune yourself, as that code is also available! 🤗
  • 👉 All training info & code: github.com/zer0int/CLIP-SAE-finetune
  • Buy me a coffee

image/png