{"id":2813,"date":"2025-09-26T22:24:36","date_gmt":"2025-09-26T15:24:36","guid":{"rendered":"https:\/\/research.binus.ac.id\/airdc\/?p=2813"},"modified":"2025-09-26T22:24:44","modified_gmt":"2025-09-26T15:24:44","slug":"early-study-of-self-perturbation-learning-spl-method-for-mathematical-reasoning-verification","status":"publish","type":"post","link":"https:\/\/research.binus.ac.id\/airdc\/2025\/09\/early-study-of-self-perturbation-learning-spl-method-for-mathematical-reasoning-verification\/","title":{"rendered":"Early Study of Self-Perturbation Learning (SPL) Method for Mathematical Reasoning Verification"},"content":{"rendered":"<p style=\"text-align: justify\"><span data-sheets-root=\"1\">The reliance on large-scale supervised data presents a significant bottleneck for training and deploying mathematical reasoning verification models. This study introduces the Self-Perturbation Learning (SPL) method, a self-supervised alternative that eliminates the need for manual annotations. Using SPL, a verifier is trained by contrasting correct mathematical reasoning steps with automatically generated, plausible but incorrect steps (impostors). Two lie-generation strategies are explored: (1) replacing tokens based on word embedding similarity, and (2) leveraging a Large Language Model (LLM, Gemini 2.0 Flash Lite) to create semantically nuanced perturbations. SPL models were trained using ModernBERT on embedding-based (2 million samples) and LLM-based (100,000 samples) datasets. On the MATH-WD-Lite dataset, both SPL approaches outperformed a supervised baseline (0.3063 SPL-Embedding, 0.3812 SPL-LLM vs. 0.2750 Supervised). Despite using less training data, SPL-LLM achieved the highest performance, highlighting the potential of LLM-guided perturbations. These results suggest SPL as a promising direction for building mathematical reasoning verifiers with reduced supervision.<\/span><\/p>\n<p><strong>Authors:<\/strong><br \/>\n<span data-sheets-root=\"1\">Habibullah Akbar, Muhammad Hazim Al Farouq, Advendio Desandros, Mahmud Isnan, Bens Pardamean<\/span><\/p>\n<p><em>2025 International Conference on Computer Science and Computational Intelligence (ICCSCI)<\/em><\/p>\n<p><a href=\"https:\/\/www.researchgate.net\/publication\/394120507_Early_Study_of_Self-Perturbation_Learning_SPL_Method_for_Mathematical_Reasoning_Verification\" target=\"_blank\" rel=\"noopener\"><strong>Read Full Article<\/strong><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The reliance on large-scale supervised data presents a significant bottleneck for training and deploying mathematical reasoning verification models. This study introduces the Self-Perturbation Learning (SPL) method, a self-supervised alternative that eliminates the need for manual annotations. Using SPL, a verifier is trained by contrasting correct mathematical reasoning steps with automatically generated, plausible but incorrect steps [&hellip;]<\/p>\n","protected":false},"author":14,"featured_media":2814,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-2813","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research"],"_links":{"self":[{"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/posts\/2813","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/comments?post=2813"}],"version-history":[{"count":1,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/posts\/2813\/revisions"}],"predecessor-version":[{"id":2815,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/posts\/2813\/revisions\/2815"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/media\/2814"}],"wp:attachment":[{"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/media?parent=2813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/categories?post=2813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/tags?post=2813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}