LM Capabilities and Scaling - a kaizuberbuehler Collection

kaizuberbuehler 's Collections

Reasoning, Thinking, RL and Test-Time Scaling

Vision Language Models

Foundation Models

Synthetic Data and Self-Improvement

Agents

LM Prompt Engineering

LM Capabilities and Scaling

LM Architectures

Code Generation

EXL2 Quantized Models

LM Capabilities and Scaling

updated 8 days ago

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 27
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9, 2024 • 22
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 36
Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1, 2024 • 18
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 34
Capabilities of Gemini Models in Medicine

Paper • 2404.18416 • Published Apr 29, 2024 • 24
Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published May 20, 2024 • 28
On the Planning Abilities of Large Language Models -- A Critical Investigation

Paper • 2305.15771 • Published May 25, 2023 • 1
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Paper • 2406.09411 • Published Jun 13, 2024 • 20
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 27
GEB-1.3B: Open Lightweight Large Language Model

Paper • 2406.09900 • Published Jun 14, 2024 • 21
Mixture of A Million Experts

Paper • 2407.04153 • Published Jul 4, 2024 • 5
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Paper • 2404.05405 • Published Apr 8, 2024 • 10
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12, 2024 • 70
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 89
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 42
Making Text Embedders Few-Shot Learners

Paper • 2409.15700 • Published Sep 24, 2024 • 30
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Paper • 2406.14546 • Published Jun 20, 2024 • 2
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 92
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2 • 50
ProgCo: Program Helps Self-Correction of Large Language Models

Paper • 2501.01264 • Published Jan 2 • 25
Densing Law of LLMs

Paper • 2412.04315 • Published Dec 5, 2024 • 19
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

Paper • 2411.17691 • Published Nov 26, 2024 • 13
PokerBench: Training Large Language Models to become Professional Poker Players

Paper • 2501.08328 • Published Jan 14 • 17
Do generative video models learn physical principles from watching videos?

Paper • 2501.09038 • Published Jan 14 • 32
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Paper • 2501.12370 • Published Jan 21 • 11
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28 • 26
Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published about 1 month ago • 23
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published 16 days ago • 182
Scaling Embedding Layers in Language Models

Paper • 2502.01637 • Published 26 days ago • 23
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published 23 days ago • 30
Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Paper • 2502.07617 • Published 18 days ago • 28
Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Paper • 2502.06857 • Published 22 days ago • 23
Distillation Scaling Laws

Paper • 2502.08606 • Published 17 days ago • 46
NoLiMa: Long-Context Evaluation Beyond Literal Matching

Paper • 2502.05167 • Published 22 days ago • 15