Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
3
17
Jamin X. Chen
caesar0301
Follow
0 followers
Β·
1 following
https://xiaming.site
drjaminchen
caesar0301
AI & ML interests
Behavior data analysis, Knowledge computing
Recent Activity
reacted
to
Kseniase
's
post
with β
4 days ago
15 types of attention mechanisms Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention. Here is a list of 15 types of attention mechanisms used in AI models: 1. Soft attention (Deterministic attention) -> https://huggingface.co/papers/1409.0473 Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1. 2. Hard attention (Stochastic attention) -> https://huggingface.co/papers/1508.04025 Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything. 3. Self-attention -> https://huggingface.co/papers/1706.03762 Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation. 4. Cross-Attention (Encoder-Decoder attention) -> https://huggingface.co/papers/2104.08771 The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources. 5. Multi-Head Attention (MHA) -> https://huggingface.co/papers/1706.03762 Multiple attention βheadsβ are run in parallel.β The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values. 6. Multi-Head Latent Attention (MLA) -> https://huggingface.co/papers/2405.04434 Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations. 7. Memory-Based attention -> https://huggingface.co/papers/1503.08895 Involves an external memory and uses attention to read from and write to this memory. See other types in the comments π
reacted
to
Kseniase
's
post
with π
4 days ago
15 types of attention mechanisms Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention. Here is a list of 15 types of attention mechanisms used in AI models: 1. Soft attention (Deterministic attention) -> https://huggingface.co/papers/1409.0473 Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1. 2. Hard attention (Stochastic attention) -> https://huggingface.co/papers/1508.04025 Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything. 3. Self-attention -> https://huggingface.co/papers/1706.03762 Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation. 4. Cross-Attention (Encoder-Decoder attention) -> https://huggingface.co/papers/2104.08771 The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources. 5. Multi-Head Attention (MHA) -> https://huggingface.co/papers/1706.03762 Multiple attention βheadsβ are run in parallel.β The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values. 6. Multi-Head Latent Attention (MLA) -> https://huggingface.co/papers/2405.04434 Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations. 7. Memory-Based attention -> https://huggingface.co/papers/1503.08895 Involves an external memory and uses attention to read from and write to this memory. See other types in the comments π
liked
a model
18 days ago
allenai/olmOCR-7B-0225-preview
View all activity
Organizations
caesar0301
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
18 days ago
allenai/olmOCR-7B-0225-preview
Image-Text-to-Text
β’
Updated
25 days ago
β’
491k
β’
567
liked
a Space
23 days ago
Running
181
181
Dit Document Layout Analysis
π
Analyze document layout from images
liked
a model
5 months ago
Qwen/Qwen2.5-0.5B-Instruct
Text Generation
β’
Updated
Sep 25, 2024
β’
999k
β’
β’
273
liked
a dataset
6 months ago
yifanzhang114/MME-RealWorld
Preview
β’
Updated
Nov 14, 2024
β’
719
β’
14
liked
a model
8 months ago
breezedeus/pix2text-mfr
Image-to-Text
β’
Updated
May 5, 2024
β’
21.6k
β’
34
liked
a dataset
10 months ago
webnlg/challenge-2023
Viewer
β’
Updated
Mar 10, 2023
β’
65.6k
β’
319
β’
4
liked
a model
11 months ago
Babelscape/rebel-large
Text2Text Generation
β’
Updated
Jun 20, 2023
β’
30.1k
β’
β’
217
liked
a dataset
11 months ago
HuggingFaceFW/fineweb
Viewer
β’
Updated
Jan 31
β’
25B
β’
290k
β’
2.05k
liked
2 models
11 months ago
zjunlp/OneKE
Text Generation
β’
Updated
May 6, 2024
β’
355
β’
42
shenzhi-wang/Llama3-8B-Chinese-Chat
Text Generation
β’
Updated
Jul 4, 2024
β’
33.1k
β’
678
liked
a model
about 1 year ago
jinaai/jina-colbert-v1-en
Updated
Jan 6
β’
958
β’
99
liked
2 models
over 1 year ago
meta-llama/Llama-2-7b
Text Generation
β’
Updated
Apr 17, 2024
β’
4.28k
cerebras/Cerebras-GPT-590M
Text Generation
β’
Updated
Nov 22, 2023
β’
1.84k
β’
20
liked
a dataset
over 1 year ago
roneneldan/TinyStories
Viewer
β’
Updated
Aug 12, 2024
β’
2.14M
β’
22.7k
β’
631
liked
a dataset
almost 2 years ago
OpenAssistant/oasst1
Viewer
β’
Updated
May 2, 2023
β’
88.8k
β’
9.48k
β’
1.36k
liked
2 models
about 2 years ago
ClueAI/ChatYuan-large-v1
Text2Text Generation
β’
Updated
Apr 2, 2023
β’
53
β’
106
uer/gpt2-chinese-cluecorpussmall
Text Generation
β’
Updated
Oct 17, 2023
β’
33.2k
β’
β’
199