Temporal Consistency for LLM Reasoning Process Error Identification
Abstract
Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1. Our codes are available at https://github.com/jcguo123/Temporal-Consistency
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Bag of Tricks for Inference-time Computation of LLM Reasoning (2025)
- Dyve: Thinking Fast and Slow for Dynamic Process Verification (2025)
- Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights (2025)
- "Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding (2025)
- Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation (2025)
- CER: Confidence Enhanced Reasoning in LLMs (2025)
- Uncertainty-Aware Step-wise Verification with Generative Reward Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper