BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 48
KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2 • 16