Optimization of Random Forest Algorithm with SMOTE Method to Improve the Accuracy of Early Diabetes Prediction

Authors

  • Siti Khoirun Nisa Informatics Engineering, Universitas Nahdlatul Ulama Sunan Giri, Indonesia Author
  • Mula Agung Barata Informatics Engineering, Universitas Nahdlatul Ulama Sunan Giri, Indonesia Author
  • Pelangi Eka Yuwita Mechanical Engineering, Universitas Nahdlatul Ulama Sunan Giri, Indonesia Author

DOI:

https://doi.org/10.15294/sji.v12i3.22986

Keywords:

Diabetes, Classification, Random Forest, SMOTE

Abstract

Purpose: This research aims to examine the performance of the random forest algorithm in diabetes risk classification with data balancing using the Synthetic Minority Oversampling Technique (SMOTE) method to improve the representation of minority classes and increase the prediction accuracy value.

Methods: The study used the Behavioral Risk Factor Surveillance System (BRFSS) dataset, obtained from Kaggle, which contains health-related survey data used to identify individuals at risk of diabetes. The Random Forest algorithm was applied to classify diabetes. To balance the data, the SMOTE method was used. The model’s performance was evaluated using 10-fold cross-validation by comparing result before and after SMOTE.

Result: The results showed that the application of the SMOTE method improved the performance of the Random Forest classification model, especially in minority classes. Model performance in minority classes without SMOTE had poor evaluation metrics with precision of 49%, recall of 18%, and F1-score of 26%. After applying SMOTE, these values increased to precision of 96%, recall of 88%, and F1-score of 92%. Representing improvements of 47 percentage points in precision, 70 points in recall, and 66 points F1-score. The overall accuracy of the Random Forest model also increased from 86% to 92%, showing a 6 percentage point improvement.

Novelty: This study use integrating the Random Forest algorithm with the SMOTE technique and validating the results using 10-fold cross-validation. The combination significantly improves minority class prediction performance in early diabetes detection, addressing the common limitations of previous studies in handling imbalanced datasets effectively.

Downloads

Published

04-08-2025

Article ID

22986

Issue

Section

Articles

How to Cite

Optimization of Random Forest Algorithm with SMOTE Method to Improve the Accuracy of Early Diabetes Prediction. (2025). Scientific Journal of Informatics, 12(3), 387-396. https://doi.org/10.15294/sji.v12i3.22986