AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22 • 67
Stabilizing Transformer Training by Preventing Attention Entropy Collapse Paper • 2303.06296 • Published Mar 11, 2023
Learning Controllable 3D Diffusion Models from Single-view Images Paper • 2304.06700 • Published Apr 13, 2023
What Algorithms can Transformers Learn? A Study in Length Generalization Paper • 2310.16028 • Published Oct 24, 2023 • 2
Value function estimation using conditional diffusion models for control Paper • 2306.07290 • Published Jun 9, 2023
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization Paper • 2401.15914 • Published Jan 29 • 7
How Far Are We from Intelligent Visual Deductive Reasoning? Paper • 2403.04732 • Published Mar 7 • 19
NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion Paper • 2302.10109 • Published Feb 20, 2023
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published Oct 10 • 25
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21 • 43
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Paper • 2405.21048 • Published May 31 • 13
Position Prediction as an Effective Pretraining Strategy Paper • 2207.07611 • Published Jul 15, 2022 • 1
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16 • 36