Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper β’ 2502.11089 β’ Published 9 days ago β’ 134
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper β’ 2502.12115 β’ Published 8 days ago β’ 41
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 β’ 68
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 73
Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published Dec 9, 2024 β’ 78
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 45 items β’ Updated Nov 28, 2024 β’ 529
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published Dec 13, 2024 β’ 140
Prompt Order Experiment Collection Prompt Order Experiment shows how to run a simple experiment on the hub and leverage tools like AutoTrain, and Inference Endpoints. β’ 16 items β’ Updated Jan 14 β’ 2
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper β’ 2412.03555 β’ Published Dec 4, 2024 β’ 128
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M β’ 16 items β’ Updated 5 days ago β’ 240
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python Oct 22, 2024 β’ 44
AutoTrain: No-code training for state-of-the-art models Paper β’ 2410.15735 β’ Published Oct 21, 2024 β’ 59
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Paper β’ 2410.07170 β’ Published Oct 9, 2024 β’ 15
Molmo Collection Artifacts for open multimodal language models. β’ 5 items β’ Updated 15 days ago β’ 296
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 β’ 15 items β’ Updated Dec 6, 2024 β’ 570