arxiv:2411.19477

A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models

Published on Nov 29

· Submitted by

yanxi-chen on Dec 3

Upvote

Authors:

Yanxi Chen ,

Xuchen Pan ,

Jingren Zhou

Abstract

We propose a general two-stage algorithm that enjoys a provable scaling law for the test-time compute of large language models (LLMs). Given an input problem, the proposed algorithm first generates N candidate solutions, and then chooses the best one via a multiple-round knockout tournament where each pair of candidates are compared for K times and only the winners move on to the next round. In a minimalistic implementation, both stages can be executed with a black-box LLM alone and nothing else (e.g., no external verifier or reward model), and a total of N times (K + 1) highly parallelizable LLM calls are needed for solving an input problem. Assuming that a generated candidate solution is correct with probability p_{gen} > 0 and a comparison between a pair of correct and incorrect solutions identifies the right winner with probability p_{comp} > 0.5 (i.e., better than a random guess), we prove theoretically that the failure probability of the proposed algorithm decays to zero exponentially with respect to N and K: $P(final output is incorrect) le (1 - p_{gen})^N + lceil log_2 N rceil e^{-2 K (p_{comp} - 0.5)^2}.$ Our empirical results with the challenging MMLU-Pro benchmark validate the technical assumptions, as well as the efficacy of the proposed algorithm and the gains from scaling up its test-time compute.

View arXiv page View PDF Add to collection

Community

yanxi-chen

Paper author Paper submitter 23 days ago

This work proposes a general two-stage algorithm that enjoys a provable scaling law for the test-time compute of large language models (LLMs). Given an input problem, the proposed algorithm first generates multiple candidate solutions, and then chooses a single one among them as the final output, via a knockout tournament where pairwise comparisons among the candidates are conducted. Under certain technical assumptions about the input problem and LLM(s), it can be proved theoretically that the failure probability of the proposed algorithm decays to zero exponentially with respect to the amount of test-time compute.

librarian-bot

22 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.19477 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.19477 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.19477 in a Space README.md to link it from this page.