Papers
arxiv:2502.11196

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Published on Feb 16
· Submitted by Ningyu on Feb 18
Authors:
,
,

Abstract

Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.

Community

Paper author Paper submitter

How do LLMs acquire new knowledge? We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing.

Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to preexisting knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern.

Are those findings decoder-only specific or general to all architectures?

·
Paper author

We conduct experiments with decoder-only LLMs and introduce limitations at the end of this paper.

Useful conclusions. Point about "pre-existing is better than new knowledge" confirms my bias about curriculum learning, but this method raises the question how to put the whole internet into a curriculum.

·
Paper author

Useful conclusions. Point about "pre-existing is better than new knowledge" confirms my bias about curriculum learning, but this method raises the question how to put the whole internet into a curriculum.

I think focusing on iterative, domain-aware progression, rather than rigid, linear paths, could be an idea.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.11196 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.11196 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.11196 in a Space README.md to link it from this page.

Collections including this paper 3