view article Article Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ By Wauplin and 2 others • 24 days ago • 75
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment Paper • 2507.20984 • Published 20 days ago • 54
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 243
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • Jul 9 • 645
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published Jul 2 • 53
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs Paper • 2506.15211 • Published Jun 18 • 36
HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published May 30 • 44
view article Article Falcon-Arabic: A Breakthrough in Arabic Language Models By tiiuae and 7 others • May 21 • 34
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval Paper • 2505.16967 • Published May 22 • 23
U-MATH and μ-MATH - University-level math evaluation Collection Paper: A UNIVERSITY-LEVEL BENCHMARK FOR EVALUATING MATHEMATICAL SKILLS IN LLMS • 4 items • Updated Jan 14 • 17
SARD: Synthetic Arabic Recognition Dataset Collection A large-scale synthetic Arabic OCR dataset comprising 843,622 book-style document images across 10 fonts, designed to advance VLM for Arabic Texts • 2 items • Updated May 19 • 4
view article Article LeRobot Community Datasets: The “ImageNet” of Robotics — When and How? By danaaubakirova and 6 others • May 11 • 74
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Paper • 2505.04588 • Published May 7 • 66
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models Paper • 2504.20605 • Published Apr 29 • 14
Sadeed: Advancing Arabic Diacritization Through Small Language Model Paper • 2504.21635 • Published Apr 30 • 59