Optimization of Classification Accuracy Using K-Means and Genetic Algorithm by Integrating C4.5 Algorithm for Diagnosis Breast Cancer Disease

  • Fachrizal Ahdy Andoyo Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Negeri Semarang, Semarang, Indonesia
  • Riza Arifudin Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Negeri Semarang, Semarang, Indonesia
Keywords: Data Mining, K-Means, Genetic Algorithm, Decision Tree, C4.5 Algorithm, Wisconsin Diagnostic Breast Cancer

Abstract

Technological development resulted in data proliferating. The data is processed into valid information for daily needs. Data mining is a technique to convert data into useful information. Data mining has been widely used in performing prediction functions, for example, health and medical science. This study using Wisconsin Diagnostic Breast Cancer dataset taken from UCI Machine Learning Repository. The dataset has 32 attributes with 569 samples. This data has a continuous and high dimensional data type, and it makes the C4.5 algorithm need long computation time and extensive storage. This study aims to improve the accuracy of the C4.5 with a combination of K-Means and Genetic Algorithm. These study results compared the accuracy of the C4.5 algorithm before and after applying the combination of K-Means and the Genetic Algorithm for diagnosing breast cancer. The accuracy of C4.5 is 91,228%. Meanwhile, the accuracy of C4.5 after optimized using the K-Means and Genetic Algorithm is 94,824%, with the average number of features are selected 22 features. Thus, the application of K-Means and Genetic Algorithm on the C4.5 Algorithm can improve the accuracy of diagnosing breast cancer by 3,596%.

Published
2021-04-14
How to Cite
Andoyo, F., & Arifudin, R. (2021). Optimization of Classification Accuracy Using K-Means and Genetic Algorithm by Integrating C4.5 Algorithm for Diagnosis Breast Cancer Disease. Journal of Advances in Information Systems and Technology, 3(1), 1-8. https://doi.org/10.15294/jaist.v3i1.49011
Section
Articles