Random Forest Algorithm Optimization using K-Nearest Neighborand SMOTE on Diabetes Disease
DOI:
https://doi.org/10.15294/rji.v3i1.1576Keywords:
Diabetes Disease, Random Forest, K-Nearest Neighbor, SMOTEAbstract
Abstract. Diabetes is a chronic disease that can cause long-term damage, dysfunction and failure of various organs in the body. Diabetes occurs due to an increase in blood sugar (glucose) levels exceeding normal values. Early diagnosis of diseases is crucial for addressing them, especially in the case of diabetes, which is one of the chronic illnesses.
Purpose: This study aims to find out how the implementation of the K-Nearest Neighbor algorithm with the Synthetic Minority Oversampling Technique (SMOTE) in optimizing Random Forest algorithm for diabetes disease prediction.
Methods/Study design/approach: This study uses the Pima Indian Diabetes Dataset, the random forest algorithm for the classification, k-nearest neighbor for optimization, and SMOTE for the minority class oversampling.
Result/Findings: The prediction accuracy of the model using SMOTE and k-nearest neighbor is 92,86%. Meanwhile, the model that does not use SMOTE and k-nearest neighbor obtains an accuracy of 83,03%.
Novelty/Originality/Value: This research shows that the use of random forest algorithm with k-nearest neighbor and SMOTE gives better accuracy than without using k-nearest neighbor and SMOTE.