Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.
Here is a list of 15 types of attention mechanisms used in AI models:
3. Self-attention -> Attention Is All You Need (1706.03762) Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.
5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762) Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.