HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published Oct 2, 2024 • 22
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 9 days ago • 45
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published 27 days ago • 52
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 74
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 4 days ago • 60
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment Paper • 2412.13746 • Published 24 days ago • 9
📀 Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated Jun 12, 2024 • 35
🧪 FineWeb v1 data experiments Collection Ablation models trained for our data experiments. • 22 items • Updated Jun 12, 2024 • 4
Open Preference Datasets Collection Alignment Learning을 위한 공개 데이터셋 중, 좋은 데이터를 정리해주세요! (english or multi-lingual) (ultrafeedback-binarized 포맷) • 2 items • Updated Dec 4, 2024 • 1
Thinking LLMs: General Instruction Following with Thought Generation Paper • 2410.10630 • Published Oct 14, 2024 • 18
Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations Paper • 2310.13420 • Published Oct 20, 2023 • 2
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 108
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 34
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 73