google/siglip2-so400m-patch14-384 · Question About SigLIP 2’s Performance with Newline-Separated Labels

Feb 22

I’ve been experimenting with SigLIP 1 and SigLIP 2 in the Hugging Face Space and noticed something interesting. When I input labels in two formats—comma-separated (e.g., "photojournalism photography, editorial photography") versus newline-separated (e.g., "photojournalism photography,\neditorial photography,\n...")—SigLIP 1 consistently performs more accurately, while SigLIP 2 seem to perform better when there is a new line. I have also tested with diverse set of images and noticed similar pattern.

Could you shed light on why SigLIP 2 handles newline-separated labels better? Is this an intentional design choice, like training on noisier text data, or an artifact of the tokenizer?

Comma separated:

Comma separated + New line:

Thank you!

giffmana

Mar 9

This is unrelated to the model - you're using the space wrong. Do not use any newline, just comma separate the labels, nothing else.

zfjerome1 changed discussion status to closed Mar 9