kaizuberbuehler
's Collections
LM Capabilities and Scaling
updated
Compression Represents Intelligence Linearly
Paper
•
2404.09937
•
Published
•
27
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
•
2404.06395
•
Published
•
22
Long-context LLMs Struggle with Long In-context Learning
Paper
•
2404.02060
•
Published
•
36
Are large language models superhuman chemists?
Paper
•
2404.01475
•
Published
•
18
FlowMind: Automatic Workflow Generation with LLMs
Paper
•
2404.13050
•
Published
•
34
Capabilities of Gemini Models in Medicine
Paper
•
2404.18416
•
Published
•
24
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper
•
2405.12107
•
Published
•
28
On the Planning Abilities of Large Language Models -- A Critical
Investigation
Paper
•
2305.15771
•
Published
•
1
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper
•
2406.09170
•
Published
•
27
MuirBench: A Comprehensive Benchmark for Robust Multi-image
Understanding
Paper
•
2406.09411
•
Published
•
20
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo
Tree Self-refine with LLaMa-3 8B
Paper
•
2406.07394
•
Published
•
27
GEB-1.3B: Open Lightweight Large Language Model
Paper
•
2406.09900
•
Published
•
21
Mixture of A Million Experts
Paper
•
2407.04153
•
Published
•
5
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper
•
2404.05405
•
Published
•
10
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
•
2408.06195
•
Published
•
70
Attention Heads of Large Language Models: A Survey
Paper
•
2409.03752
•
Published
•
89
HelloBench: Evaluating Long Text Generation Capabilities of Large
Language Models
Paper
•
2409.16191
•
Published
•
42
Making Text Embedders Few-Shot Learners
Paper
•
2409.15700
•
Published
•
30
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
Disparate Training Data
Paper
•
2406.14546
•
Published
•
2
Are Your LLMs Capable of Stable Reasoning?
Paper
•
2412.13147
•
Published
•
92
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
50
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
•
2501.01264
•
Published
•
25
Paper
•
2412.04315
•
Published
•
19
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
Quantized LLMs with 100T Training Tokens
Paper
•
2411.17691
•
Published
•
13
PokerBench: Training Large Language Models to become Professional Poker
Players
Paper
•
2501.08328
•
Published
•
17
Do generative video models learn physical principles from watching
videos?
Paper
•
2501.09038
•
Published
•
32
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for
Mixture-of-Experts Language Models
Paper
•
2501.12370
•
Published
•
11
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper
•
2501.16975
•
Published
•
26
Large Language Models Think Too Fast To Explore Effectively
Paper
•
2501.18009
•
Published
•
23
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of
Physical Concept Understanding
Paper
•
2502.08946
•
Published
•
182
Scaling Embedding Layers in Language Models
Paper
•
2502.01637
•
Published
•
23
Great Models Think Alike and this Undermines AI Oversight
Paper
•
2502.04313
•
Published
•
30
Scaling Pre-training to One Hundred Billion Data for Vision Language
Models
Paper
•
2502.07617
•
Published
•
28
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Paper
•
2502.06857
•
Published
•
23
Distillation Scaling Laws
Paper
•
2502.08606
•
Published
•
46
NoLiMa: Long-Context Evaluation Beyond Literal Matching
Paper
•
2502.05167
•
Published
•
15