Why softmax ?

#2
by zhshj0110 - opened

Why is the softmax function applied last in the example usage?
I notice the model is an siglip model. And the siglip uses the sigmoid function whether in the original text or in the usage.
Although using that activation function does not affect the final matching results, I don't understand why it is designed this way.

Sign up or log in to comment