Output attention shape

#2
by Yingshu - opened

I explored the output attentions. I found the output is a tuple containing 4 elements. The size of each element is:
[64,4,49,49]
[16,8,49,49]
[4,16,49,49]
[1,32,49,49]

I know the second dimension, 4, 8, 16, 32 is the num_heads. I want to ask what the first dimension (64, 16, 4, 1) is.

Sign up or log in to comment