Integration of Random Forest, ADASYN, and SHAP for Diabetes Prediction and Interpretation

Authors

  • Hozana Aulia Master of Information Systems, Postgraduate School, Universitas Diponegoro, Indonesia Author
  • Adi Wibowo Department of Informatics, Faculty of Science and Mathematics, Universitas Diponegoro, Indonesia Author
  • Sutrisno Sutrisno Department of Mathematics, Faculty of Science and Mathematics, Universitas Diponegoro, Indonesia Author

DOI:

https://doi.org/10.15294/sji.v12i2.24314

Keywords:

SHAP, ADASYN, Random forest, Diabetes prediction, Machine learning

Abstract

Purpose: Diabetes is a chronic disease with a globally rising prevalence. Early detection of individuals at risk is essential to prevent long-term complications. This study aims to develop a diabetes prediction model that not only achieves high classification accuracy but also provides transparent explanations of the factors influencing its predictions.

Methods: The study utilized the Pima Indians Diabetes Dataset, which contains clinical data from 768 female patients aged over 21. The methodology included data preprocessing (handling of missing values and feature engineering, such as the creation of Age_BMI and Glucose_BMI features), a 70:30 train-test split, class imbalance handling using the ADASYN technique, model development using the Random Forest algorithm with hyperparameter tuning via GridSearchCV, and model interpretability analysis using SHAP.

Result: The proposed model achieved an accuracy of 79.2% and a recall of 85.2% on the test data. SHAP analysis revealed that Glucose, Age_BMI, BMI, and DiabetesPedigreeFunction were the most influential features in predicting diabetes. Furthermore, the SHAP heatmap indicated that individuals aged 30–50 years with obesity were at the highest risk. These findings align with existing medical literature, reinforcing the role of metabolic and age-related factors in diabetes development.

Novelty: This study presents an integrative approach combining class balancing (ADASYN), classification (Random Forest), and model interpretability (SHAP) in a unified framework for diabetes prediction. It emphasizes the importance of transparent model interpretation for healthcare professionals, enabling not only predictive outcomes but also actionable insights into risk factors. The findings support future research opportunities, including the integration of lifestyle variables and external validation using real-world clinical data from diverse populations.

Author Biographies

  • Adi Wibowo, Department of Informatics, Faculty of Science and Mathematics, Universitas Diponegoro, Indonesia

    Dr. Eng. Adi Wibowo, S.Si., M.Kom., Lecturer at the Department of Informatics, Universitas Diponegoro.

  • Sutrisno Sutrisno, Department of Mathematics, Faculty of Science and Mathematics, Universitas Diponegoro, Indonesia

    Dr. Sutrisno, S.Si., M.Sc., Lecturer at the Department of Mathematics, Universitas Diponegoro.

Downloads

Published

23-06-2025

Article ID

24314

Issue

Section

Articles

How to Cite

Integration of Random Forest, ADASYN, and SHAP for Diabetes Prediction and Interpretation. (2025). Scientific Journal of Informatics, 12(2), 211-222. https://doi.org/10.15294/sji.v12i2.24314