Papers
arxiv:2502.01908

Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales

Published on Feb 4
Authors:
,
,

Abstract

Recent advancements in Large Language Model (LLM) compression, such as BitNet and <PRE_TAG>BitNet b1.58</POST_TAG>, have marked significant strides in reducing the computational demands of LLMs through innovative one-bit quantization techniques. We extend this frontier by looking at Large Inference Models (LIMs) that have become indispensable across various applications. However, their scale and complexity often come at a significant computational cost. We introduce a novel approach that leverages one-bit algorithm unrolling, effectively integrating information from the physical world in the model architecture. Our method achieves a bit-per-link rate significantly lower than the 1.58 bits reported in prior work, thanks to the natural sparsity that emerges in our network architectures. We numerically demonstrate that the proposed one-bit algorithm unrolling scheme can improve both training and test outcomes by effortlessly increasing the number of layers while substantially compressing the network. Additionally, we provide theoretical results on the generalization gap, convergence rate, stability, and sensitivity of our proposed one-bit algorithm unrolling.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.01908 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.01908 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.01908 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.