-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 23 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 111 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 21 -
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper • 2502.20545 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2501.19399
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 276 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 21 -
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
Paper • 2502.01068 • Published • 16 -
Scaling Embedding Layers in Language Models
Paper • 2502.01637 • Published • 24
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 26 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 84 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 21 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 7
-
Agent Workflow Memory
Paper • 2409.07429 • Published • 29 -
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Paper • 2409.07129 • Published • 8 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26 -
Imagine yourself: Tuning-Free Personalized Image Generation
Paper • 2409.13346 • Published • 69
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 28 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 14 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 32