Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.19399

Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29 • 23
s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 111
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 21
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Paper • 2502.20545 • Published 14 days ago • 20

LLM Architecture

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 276
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 21
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation

Paper • 2502.01068 • Published Feb 3 • 16
Scaling Embedding Layers in Language Models

Paper • 2502.01637 • Published Feb 3 • 24

interesting architecture

about 13 hours ago

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 26
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 84
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 21
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published 28 days ago • 7

Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11, 2024 • 29
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis

Paper • 2409.07129 • Published Sep 11, 2024 • 8
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published Sep 6, 2024 • 26
Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published Sep 20, 2024 • 69

about 8 hours ago

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 28
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 14
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 32

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs