Comparison of Ensemble Forest-Based Methods Performance for Imbalanced Data Classification

Yunia Hasnataeni; Asep Saefuddin; Agus Mohamad Soleh

doi:10.15294/sji.v12i2.24269

Authors

Yunia Hasnataeni Statistics and Data Science Department, IPB University, Indonesia Author https://orcid.org/0009-0005-1348-8492
Asep Saefuddin Statistics and Data Science Department, IPB University, Indonesia Author https://orcid.org/0000-0002-1694-9515
Agus M Soleh Statistics and Data Science Department, IPB University, Indonesia Author https://orcid.org/0000-0002-2732-1985

DOI:

https://doi.org/10.15294/sji.v12i2.24269

Keywords:

Random forest-based methods, Imbalanced data, Resampling techniques, Ensemble learning, Rainfall classification

Abstract

Purpose: Classification of imbalanced data presents a major challenge in meteorological studies, particularly in rainfall classification where extreme events occur infrequently. This research addresses the issue by evaluating ensemble learning models in handling imbalanced rainfall data in Bogor Regency, aiming to improve classification performance and model reliability for hydrometeorological risk mitigation.

Methods: Four ensemble methods: RF, RoF, DRF, and RoDRF were applied to rainfall classification using three resampling techniques: SMOTE, RUS, and SMOTE-RUS-NC. The data underwent preprocessing, stratified splitting, resampling, and 5-fold cross-validation. Performance was evaluated over 100 iterations using accuracy, precision, recall, and F1-score.

Result: The combination of DRF with SMOTE-RUS-NC yielded the most balanced results between accuracy (0.989) and computation time (107.28 seconds), while RoDRF with SMOTE achieved the highest overall performance with an accuracy of 0.991 but required a longer computation time (149.30 seconds). Feature importance analysis identified average humidity, maximum temperature, and minimum temperature as the most influential predictors of extreme rainfall.

Novelty: This research contributes a comprehensive comparison of ensemble forest-based methods for imbalanced rainfall data, revealing DRF-SMOTE as an optimal trade-off between performance and efficiency. The findings contribute to improved rainfall classification models and offer practical insight for disaster mitigation planning and resource management in tropical regions.

Comparison of Ensemble Forest-Based Methods Performance for Imbalanced Data Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Article ID

Issue

Section

How to Cite

Main-Sidebar

Stat Counter