Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 16 days ago • 62
Kanana: Compute-efficient Bilingual Language Models Paper • 2502.18934 • Published about 1 month ago • 64
Self-Training Large Language Models for Tool-Use Without Demonstrations Paper • 2502.05867 • Published Feb 9
Kanana: Compute-efficient Bilingual Language Models Paper • 2502.18934 • Published about 1 month ago • 64
Beyond Release: Access Considerations for Generative AI Systems Paper • 2502.16701 • Published Feb 23 • 12
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published Feb 24 • 25
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 215
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 139
view post Post 14807 Google drops Gemini 2.0 Flash Thinkinga new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and morenow available in anychat, try it out: akhaliq/anychat See translation 3 replies · 🚀 10 10 🔥 5 5 👍 3 3 👀 2 2 + Reply
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper • 2412.02980 • Published Dec 4, 2024 • 14
view post Post 15044 QwQ-32B-Preview is now available in anychatA reasoning model that is competitive with OpenAI o1-mini and o1-previewtry it out: akhaliq/anychat See translation 1 reply · ❤️ 3 3 👀 2 2 + Reply