Principal Component Analysis for Prediabetes Prediction using Extreme Gradient Boosting (XGBoost)

Authors

  • Kartina Diah Kesuma Wardhani Politeknik Caltex Riau Author
  • Wenda Novayani Politeknik Caltex Riau Author

DOI:

https://doi.org/10.15294/sji.v11i3.13416

Keywords:

Prediabetes, Medical data, Principal component analysis, XGBoost

Abstract

Purpose: The purpose of this study is to increase the accuracy of the model used for prediabetes prediction. This study integrates Principal Component Analysis (PCA) for reducing the dimension of data with Extreme Gradient Boosting (XGBoost). The study contributes to providing a new alternative for prediabetes prediction in patients by reducing the complexity of the dataset with the aim of increasing the accuracy of the obtained model. PCA and XGBoost identify the best features that have the highest correlation with prediabetes so that they are expected to produce a better predictive model.

Methods: This study utilizes published data sourced from the UCI Machine Learning Repository consisting of 520 records, 16 attributes and 1 label class. The dataset is data collected through direct questionnaires from patients in Sylhet, Bangladesh at the Sylhet Diabetes Hospital. The research method in this study consists of several stages, namely: Data Collection, Data Preprocessing, Dimension Reduction using PCA to reduce the complexity of dimensions in the dataset, Modeling using XGBoost to identify patterns used to predict prediabetes, and Model evaluation used to measure the performance of the resulting model using evaluation metrics such as accuracy, recall, precision and F1-Score.

Result: The current study utilizes XGBoost with Principal Component Analysis for feature selection, resulting in 12 features and a model accuracy of 97.44.

Novelty: The study's originality lies in applying PCA as a preprocessing step to enhance the performance of machine learning models by reducing data dimensionality and focusing on the most critical features. By demonstrating how PCA can improve the efficiency and accuracy of prediabetes prediction models, this research provides valuable insights to inform future studies and contribute to the development of more effective diagnostic tools for early detection and prevention of prediabetes.

Downloads

Article ID

13416

Published

06-11-2024

Issue

Section

Articles

How to Cite

Principal Component Analysis for Prediabetes Prediction using Extreme Gradient Boosting (XGBoost). (2024). Scientific Journal of Informatics, 11(3), 863-872. https://doi.org/10.15294/sji.v11i3.13416