📃 Paper ⚙️ Code 🤗 Data 📭 Contact

Overview

Process Reward Models (PRMs) evaluate the reasoning process and provide guidance to policy models, which have the potential to further enhance the reasoning capabilities of LLMs. However, current PRMs are limited by their performance.

We propose Reasoning-Driven Process Reward Modeling (R-PRM), a novel approach that enhances LLMs’ ability to evaluate mathematical reasoning step-by-step. Our Framework consists of three parts: supervised cold-start, further meta-optimization in a self-evolving style, and finally inference-time-scaling.

R-PRM.jpg

🏆 Experiment Results

🧪 Data Efficiency

R-PRM demonstrates exceptional data efficiency under varying training scales:

DataScaling.png

📊 ProcessBench