NILE: Internal Consistency Alignment in Large Language Models
Abstract
As a crucial step to enhance LLMs alignment with human intentions, Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However, existing IFT datasets often contain knowledge that is inconsistent with LLMs' internal knowledge learned from the pre-training phase, which can greatly affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock LLMs' capability further. NILE operates by eliciting target pre-trained LLM's internal knowledge corresponding to instruction data. The internal knowledge is leveraged to revise the answer in IFT datasets. Additionally, we propose a novel Internal Consistency Filtering (ICF) method to filter training samples, ensuring its high consistency with LLM's internal knowledge. Our experiments demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across multiple LLM ability evaluation datasets, achieving up to 66.6% gain on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE}framework contributes to these substantial performance improvements, and provides compelling evidence that dataset consistency with pre-trained internal knowledge is pivotal for maximizing LLM potential.
Community
Instruction fine-tuning has been proven to be a crucial method for enhancing the capabilities of LLMs. But how does Instruction fine-tuning differ from traditional fine-tuning in deep learning? And can this distinction help make instruction fine-tuning more effective? Some studies suggest that fine-tuning LLMs should not focus on acquiring new knowledge for pretrained LLMs but rather on understanding tasks. It emphasizes the importance of maintaining consistency with the internal knowledge of LLMs during fine-tuning. This approach has emerged as a promising strategy for optimizing instruction fine-tuning (IFT) datasets to further unlock the potential of LLMs. Inspired by these findings, we propose a novel framework called NILE (INTERNAL CONSISTENCY ALIGNMENT), which generates and selects better IFT datasets by considering the consistency between the internal parameter knowledge of LLMs and the world knowledge in IFT datasets. NILE works by eliciting the target pre-trained LLM's internal knowledge corresponding to instruction data. This internal knowledge is then used to revise the answers in the IFT datasets. Our experiments demonstrate that NILE-aligned IFT datasets significantly enhance LLM performance across multiple LLM evaluation benchmarks, achieving up to a 66.6% improvement on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE framework contributes to these remarkable performance gains, providing compelling evidence that ensuring dataset consistency with the internal knowledge of pre-trained LLMs is pivotal for maximizing their potential.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper