Why softmax ?
#2
by
zhshj0110
- opened
Why is the softmax function applied last in the example usage?
I notice the model is an siglip model. And the siglip uses the sigmoid function whether in the original text or in the usage.
Although using that activation function does not affect the final matching results, I don't understand why it is designed this way.