33 92 221

dame rajee

damerajee

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

updated a model 3 days ago

damerajee/super-transformers-model

published a model 4 days ago

damerajee/super-transformers-model

View all activity

Organizations

damerajee's activity

upvoted a paper 2 days ago

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published 6 days ago • 139

upvoted a paper 4 days ago

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation

Paper • 2502.14846 • Published 6 days ago • 13

upvoted a paper 8 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 11 days ago • 134

upvoted a paper 14 days ago

Matryoshka Quantization

Paper • 2502.06786 • Published 16 days ago • 29

upvoted an article 19 days ago

Article

What is test-time compute and how to scale it?

and 1 other •

20 days ago

• 42

upvoted an article 21 days ago

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

23 days ago

• 109

upvoted a paper 26 days ago

WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

Paper • 2501.18511 • Published 27 days ago • 19

upvoted a paper 28 days ago

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published 29 days ago • 26

upvoted 2 articles about 1 month ago

Article

Topic 24: What is Cosmos World Foundation Model Platform?

and 1 other •

Jan 23

• 6

Article

Timm ❤️ Transformers: Use any timm model with transformers

Jan 16

• 40

upvoted 3 papers about 1 month ago

upvoted 3 papers about 2 months ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 258

Scaling Laws for Floating Point Quantization Training

Paper • 2501.02423 • Published Jan 5 • 26

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 90

upvoted an article about 2 months ago

Article

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

•

Jan 2

• 40

upvoted a paper about 2 months ago

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Paper • 2412.19326 • Published Dec 26, 2024 • 18

upvoted 2 papers 2 months ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 37

Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46

dame rajee

AI & ML interests

Recent Activity

Organizations

damerajee's activity

What is test-time compute and how to scale it?

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

**Topic 24: What is Cosmos World Foundation Model Platform?**

Timm ❤️ Transformers: Use any timm model with transformers

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

Topic 24: What is Cosmos World Foundation Model Platform?