Early Study of Self-Perturbation Learning (SPL) Method for Mathematical Reasoning Verification

The reliance on large-scale supervised data presents a significant bottleneck for training and deploying mathematical reasoning verification models. This study introduces the Self-Perturbation Learning (SPL) method, a self-supervised alternative that eliminates the need for manual annotations. Using SPL, a verifier is trained by contrasting correct mathematical reasoning steps with automatically generated, plausible but incorrect steps (impostors). Two lie-generation strategies are explored: (1) replacing tokens based on word embedding similarity, and (2) leveraging a Large Language Model (LLM, Gemini 2.0 Flash Lite) to create semantically nuanced perturbations. SPL models were trained using ModernBERT on embedding-based (2 million samples) and LLM-based (100,000 samples) datasets. On the MATH-WD-Lite dataset, both SPL approaches outperformed a supervised baseline (0.3063 SPL-Embedding, 0.3812 SPL-LLM vs. 0.2750 Supervised). Despite using less training data, SPL-LLM achieved the highest performance, highlighting the potential of LLM-guided perturbations. These results suggest SPL as a promising direction for building mathematical reasoning verifiers with reduced supervision.

Authors:
Habibullah Akbar, Muhammad Hazim Al Farouq, Advendio Desandros, Mahmud Isnan, Bens Pardamean

2025 International Conference on Computer Science and Computational Intelligence (ICCSCI)

Read Full Article