{"id":2987,"date":"2026-04-27T12:00:32","date_gmt":"2026-04-27T05:00:32","guid":{"rendered":"https:\/\/research.binus.ac.id\/airdc\/?p=2987"},"modified":"2026-05-04T10:31:41","modified_gmt":"2026-05-04T03:31:41","slug":"retrieval-augmented-generation-chunking-strategies-for-financial-document-analysis-a-systematic-literature-review","status":"publish","type":"post","link":"https:\/\/research.binus.ac.id\/airdc\/2026\/04\/retrieval-augmented-generation-chunking-strategies-for-financial-document-analysis-a-systematic-literature-review\/","title":{"rendered":"Retrieval-Augmented Generation Chunking Strategies for Financial Document Analysis: A Systematic Literature Review"},"content":{"rendered":"<p>Retrieval-Augmented Generation (RAG) is an important architecture for generative AI. In particular, the subtle processing of financial reports (which constitutes a key area of regulatory scrutiny for tax authorities and financial bank sectors) has been automated. The performance of any RAG, however, is ultimately determined by its document splitting policy. When traditional fixed-size text chunking approaches are applied to financial statements, which are semi-structured, mixed data type documents, they cause context fragmentation and semantic incoherence, resulting in major degradation in subsequent retrieval and generation. This review of the literature is a systematic investigation of the emergent importance of chunking strategies within the financial context. In line with PRISMA 2020, a total of 16 recent studies were integrated in this review to unravel the effects of different chunking approaches, ranging from naive text splitting to more complex, layout-aware parsing. Our analysis shows that Naive fixed-size chunking is the most common strategy, whereas structure-aware and semantic chunking approaches enhance precision up to 33.42%. We discuss known trade-offs of chunk size granularity, retrieval precision and recall, and identify architectural patterns for high-end structure-aware solutions. The results demonstrate a conclusive and essential convergence of the previously arbitrary chunking paradigm to semantic chunking that captures, if not the entirety, a significant portion of the relational and logical nature of the document.<\/p>\n<p><strong>Authors:<br \/>\n<\/strong>Ryan Agatha Nanda Widiiswa, Matthew Martianus Henry, Alyssa Imani, Bens Pardamean.<\/p>\n<p><em>2026 30th International Conference on Information Technology<\/em><\/p>\n<div class=\"ewa-rteLine\"><a href=\"https:\/\/www.researchgate.net\/publication\/403069835_Retrieval-Augmented_Generation_Chunking_Strategies_for_Financial_Document_Analysis_A_Systematic_Literature_Review\">Read Full Text<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Retrieval-Augmented Generation (RAG) is an important architecture for generative AI. In particular, the subtle processing of financial reports (which constitutes a key area of regulatory scrutiny for tax authorities and financial bank sectors) has been automated. The performance of any RAG, however, is ultimately determined by its document splitting policy. When traditional fixed-size text chunking [&hellip;]<\/p>\n","protected":false},"author":14,"featured_media":2988,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[],"class_list":["post-2987","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-publications"],"_links":{"self":[{"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/posts\/2987","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/comments?post=2987"}],"version-history":[{"count":1,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/posts\/2987\/revisions"}],"predecessor-version":[{"id":2989,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/posts\/2987\/revisions\/2989"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/media\/2988"}],"wp:attachment":[{"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/media?parent=2987"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/categories?post=2987"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/research.binus.ac.id\/airdc\/wp-json\/wp\/v2\/tags?post=2987"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}