Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost

Authors

DOI:

https://doi.org/10.15294/sji.v11i4.15937

Keywords:

Machine Learning, Balanced Random Forest, SMOTE-RF, SMOTEBoost, RUSBoost, Random Forest, AdaBoost, Imbalanced Data, Ensemble Learning

Abstract

Purpose: This research aims to identify the optimal ensemble learning method for mitigating class imbalance in datasets utilizing various advanced techniques which include balanced random forest (BRF), SMOTE-random forest (SMOTE-RF), RUSBoost, and SMOTEBoost. The methods were systematically evaluated against conventional algorithms, including random forest and AdaBoost, across heterogeneous datasets with varying class imbalance ratios.

Methods: This study utilized 13 secondary datasets from diverse sources, each with binary class outputs. The datasets exhibited varying degrees of class imbalance, offering scenarios to assess the effectiveness of ensemble learning techniques and traditional machine learning approaches in managing class imbalance issues. Study data were split into training (80%) and testing (20%), with stratified sampling applied to maintain consistent class proportions across both sets. Each method underwent hyperparameter optimization with distinct settings with repetition over 10 iterations. The optimal method was evaluated based on balanced accuracy, recall, and computation time.

Result: Based on the evaluation, the BRF method exhibited the highest performance in balanced accuracy and recall when compared to SMOTE-RF, RUSBoost, SMOTEBoost, random forest, and AdaBoost. Conversely, the classical random forest method outperformed other techniques in terms of computational efficiency.

Novelty: This study presents an innovative analysis of advanced ensemble learning techniques, including BRF, SMOTE-random forest, SMOTEBoost, and RUSBoost, which demonstrate significant effectiveness in addressing class imbalance across various datasets. By systematically optimizing hyperparameters and applying stratified sampling, this research produces findings that redefine the benchmarks of balanced accuracy, recall and computational efficiency in machine learning.

Downloads

Article ID

15937

Published

30-12-2024

Issue

Section

Articles

How to Cite

Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost. (2024). Scientific Journal of Informatics, 11(4), 969-980. https://doi.org/10.15294/sji.v11i4.15937