Boosting Algorithm to Handle Unbalanced Classification of PM2.5Concentration Levels by Observing Meteorological Parameters in Jakarta-Indonesia Using AdaBoost, XGBoost, CatBoost, and LightGBM

Air quality conditions are now more severe in the Jakarta area that is among the world’s top eight worst cities according to the 2022 Air Quality Index (AQI) report. In particular, the data from the Meteorological, Climatological, and Geophysical Agency (BMKG) of the Republic of Indonesia, the latest outcomes in air quality conditions in Jakarta and surrounding areas, says that PM2.5 concentrations have increased and peaked at 148μg/m3 in 2022. While a classification system for this pollution is necessary and critical, the observation of PM2.5 concentrations measured through the BMKG Kemayoran station, Jakarta, turns out to be identified as an unbalanced data class. Thus, in this work, we perform boosting algorithm supervised learning to handle such an unbalanced classification toward PM2.5 concentration levels by observing meteorological patterns in Jakarta during 1 January 2015 to 7 July 2022. The boosting algorithms considered in this research include Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM). Our simulations have proven that boosting classification can significantly reduce bias in combination with variance reduction with unbalanced within-class coefficients, with the classification of PM2.5 class values: good 62%, moderate 34%, and unhealthy 59%, respectively.

Authors:
Toni Toharudin, Rezzy Eko Caraka, Indah Reski Pratiwi, Yunho Kim, Prana Ugiana Gio, Anjar Dimara Sakti, Maengseok Noh, Farid Azhar Lutfi Nugraha, Resa Septiani Pontoh, Tafia Hasna Putri, Thalita Safa Azzahra, Jessica Jesslyn Cerelia, Gumgum Darmawan, and Bens Pardamean

IEEE Access

Read Full Article