Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.09871

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

Paper • 2411.06558 • Published Nov 10 • 34
SlimLM: An Efficient Small Language Model for On-Device Document Assistance

Paper • 2411.09944 • Published Nov 15 • 12
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing

Paper • 2411.19460 • Published 28 days ago • 10
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published 20 days ago • 46

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 168
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 22 days ago • 119
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published 21 days ago • 104
o1-Coder: an o1 Replication for Coding

Paper • 2412.00154 • Published 28 days ago • 41

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Paper • 2409.08513 • Published Sep 13 • 11
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12 • 43
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 74
LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published Sep 18 • 31

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 8
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 6
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 14 days ago • 76

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 58
The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 37
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17 • 28

Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12 • 59
MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Paper • 2407.09435 • Published Jul 12 • 20
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Paper • 2407.09121 • Published Jul 12 • 5
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Paper • 2407.14482 • Published Jul 19 • 25

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 95
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6 • 34
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 109

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 14
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 5

about 20 hours ago

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 26
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24 • 12
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 46
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28

Papers - Llama 3

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 44
Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2 • 1
Natural Language Reinforcement Learning

Paper • 2411.14251 • Published Nov 21 • 27
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 14 days ago • 76

Previous
1
...
4
5
6
7
8
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs