Jamin X. Chen

caesar0301

AI & ML interests

Behavior data analysis, Knowledge computing

Recent Activity

reacted to Kseniase's post with ➕ 4 days ago

15 types of attention mechanisms Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention. Here is a list of 15 types of attention mechanisms used in AI models: 1. Soft attention (Deterministic attention) -> https://huggingface.co/papers/1409.0473 Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1. 2. Hard attention (Stochastic attention) -> https://huggingface.co/papers/1508.04025 Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything. 3. Self-attention -> https://huggingface.co/papers/1706.03762 Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation. 4. Cross-Attention (Encoder-Decoder attention) -> https://huggingface.co/papers/2104.08771 The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources. 5. Multi-Head Attention (MHA) -> https://huggingface.co/papers/1706.03762 Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values. 6. Multi-Head Latent Attention (MLA) -> https://huggingface.co/papers/2405.04434 Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations. 7. Memory-Based attention -> https://huggingface.co/papers/1503.08895 Involves an external memory and uses attention to read from and write to this memory. See other types in the comments 👇

reacted to Kseniase's post with 👍 4 days ago

liked a model 18 days ago

allenai/olmOCR-7B-0225-preview

View all activity

Organizations

caesar0301's activity

liked a model 18 days ago

allenai/olmOCR-7B-0225-preview

Image-Text-to-Text • Updated 25 days ago • 491k • 567

liked a Space 23 days ago

181

Dit Document Layout Analysis

👀

Analyze document layout from images

liked a model 5 months ago

Qwen/Qwen2.5-0.5B-Instruct

Text Generation • Updated Sep 25, 2024 • 999k • • 273

liked a dataset 6 months ago

yifanzhang114/MME-RealWorld

Preview • Updated Nov 14, 2024 • 719 • 14

liked a model 8 months ago

breezedeus/pix2text-mfr

Image-to-Text • Updated May 5, 2024 • 21.6k • 34

liked a dataset 10 months ago

webnlg/challenge-2023

Viewer • Updated Mar 10, 2023 • 65.6k • 319 • 4

liked a model 11 months ago

Babelscape/rebel-large

Text2Text Generation • Updated Jun 20, 2023 • 30.1k • • 217

liked a dataset 11 months ago

HuggingFaceFW/fineweb

Viewer • Updated Jan 31 • 25B • 290k • 2.05k

liked 2 models 11 months ago

zjunlp/OneKE

Text Generation • Updated May 6, 2024 • 355 • 42

shenzhi-wang/Llama3-8B-Chinese-Chat

Text Generation • Updated Jul 4, 2024 • 33.1k • 678

liked a model about 1 year ago

jinaai/jina-colbert-v1-en

Updated Jan 6 • 958 • 99

liked 2 models over 1 year ago

meta-llama/Llama-2-7b

Text Generation • Updated Apr 17, 2024 • 4.28k

cerebras/Cerebras-GPT-590M

Text Generation • Updated Nov 22, 2023 • 1.84k • 20

liked a dataset over 1 year ago

roneneldan/TinyStories

Viewer • Updated Aug 12, 2024 • 2.14M • 22.7k • 631

liked a dataset almost 2 years ago

OpenAssistant/oasst1

Viewer • Updated May 2, 2023 • 88.8k • 9.48k • 1.36k

liked 2 models about 2 years ago

ClueAI/ChatYuan-large-v1

Text2Text Generation • Updated Apr 2, 2023 • 53 • 106

uer/gpt2-chinese-cluecorpussmall

Text Generation • Updated Oct 17, 2023 • 33.2k • • 199