Optimization of C4.5 Algorithm Using K-Means Algorithm and Particle Swarm Optimization Feature Selection on Breast Cancer Diagnosis

  • Anita Ayu Septiantina Department Of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Negeri Semarang, Semarang, Indonesia
  • Endang Sugiharti Department Of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Negeri Semarang, Semarang, Indonesia
Keywords: C4.5 Algorithm Optimization, K-Means, PSO, Breast Cancer

Abstract

Large data requires methods to explore information so that it can provide solutions to problem solving. The method is the data mining process. In the medical world, data mining is useful in diagnosing a disease such as breast cancer. Data mining has several techniques in exploring hidden data, one of which is a classification with the C4.5 algorithm. The C4.5 algorithm has proven better results than other decision tree algorithms. In the classification process, the results of the accuracy obtained are very important. So, optimization is needed to improve classification accuracy. The C4.5 algorithm optimization is done using the K-Means algorithm for clustering processes in continuous data and the Particle Swarm Optimization feature selection process. This research aims to determine the workings of accuracy optimization in the C4.5 algorithm and the results of accuracy obtained in breast cancer diagnosis. This research uses a dataset of the Wisconsin Diagnostic Breast Cancer (WDBC) UCI Machine Learning Repository. From the results of the research, the proposed method provides an average accuracy is 97,894%. So that provides better accuracy when compared with the C4.5 algorithm, which is 94.152%. Experiments based on the proposed method proved to be able to increase the classification accuracy by 3,742%.

Published
2020-04-30
How to Cite
Septiantina, A., & Sugiharti, E. (2020). Optimization of C4.5 Algorithm Using K-Means Algorithm and Particle Swarm Optimization Feature Selection on Breast Cancer Diagnosis. Journal of Advances in Information Systems and Technology, 2(1), 51-60. https://doi.org/10.15294/jaist.v2i1.44368
Section
Articles