Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Paper • 2401.00448 • Published Dec 31, 2023 • 28
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation Paper • 2206.10789 • Published Jun 22, 2022 • 4
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published Jun 17 • 30
Falcon Mamba: The First Competitive Attention-free 7B Language Model Paper • 2410.05355 • Published 11 days ago • 26
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering Paper • 2410.07095 • Published 9 days ago • 6
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published 10 days ago • 102
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published 17 days ago • 131
view article Article wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR?? By catherinearnett • 21 days ago • 33
AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers Paper • 2402.05602 • Published Feb 8 • 4
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 23 days ago • 96
FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators Paper • 2202.11214 • Published Feb 22, 2022 • 1
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench Paper • 2409.13373 • Published 28 days ago • 2
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change Paper • 2206.10498 • Published Jun 21, 2022 • 1
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published 28 days ago • 46
view article Article Does Daily Software Engineering Work Need Reasoning Models? By onekq • 24 days ago • 5
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 30 days ago • 258
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy about 1 month ago • 156
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Paper • 2402.12875 • Published Feb 20 • 13
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published Sep 12 • 66
Theory, Analysis, and Best Practices for Sigmoid Self-Attention Paper • 2409.04431 • Published Sep 6 • 1
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 80
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning Paper • 2406.04520 • Published Jun 6 • 10
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning Paper • 2408.14158 • Published Aug 26 • 2
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25 • 18
Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization Paper • 2406.00507 • Published Jun 1 • 1
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5 • 18
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing Paper • 2406.05534 • Published Jun 8 • 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding Paper • 2401.04398 • Published Jan 9 • 20
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 115
Jamba-1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Aug 22 • 80
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 33
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning Paper • 2407.10718 • Published Jul 15 • 17
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28 • 60
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems Paper • 2312.15234 • Published Dec 23, 2023 • 3