Implementation of Random Forest with Synthetic Minority Oversampling Technique and Particle Swarm Optimization for Predicting Survival of Heart Failure Patients
Abstract
Abstract. Heart failure is caused by a disruption in the heart’s muscle wall, which results in the heart’s inability to pump blood in sufficient quantities to meet the body’s demand for blood. The increasing prevalence and mortality rates of heart failure can be reduced through early disease detection using data mining processes. Data mining is believed to aid in discovering and interpreting specific patterns in decision-making based on processed information. Data mining has also been applied in various fields, one of which is the healthcare sector. One of the data mining techniques used to predict a decision is the classification technique.
Purpose: This research aims to apply SMOTE and PSO to the Random Forest classification algorithm in predicting the survival of heart failure patients and to determine its accuracy results.
Methods/Study design/approach: To predict the survival of heart failure patients, we utilize the Random Forest classification algorithm and incorporate data imbalance handling with SMOTE and feature selection techniques with PSO on the Heart Failure Clinical Records Dataset. The data mining process consists of three distinct phases.
Result/Findings: The application of SMOTE and PSO on the Heart Failure Clinical Records Dataset in the Random Forest classification process resulted in an accuracy rate of 93.9%. In contrast, the Random Forest classification process without SMOTE and PSO resulted in an accuracy rate of only 88.33%. This indicates that the proposed method combination can optimize the performance of the classification algorithm, achieving a higher accuracy compared to previous research.
Novelty/Originality/Value: Data imbalance and irrelevant features in the Heart Failure Clinical Records Dataset significantly impact the classification process. Therefore, this research utilizes SMOTE as a data balancing method and PSO as a feature selection technique in the Heart Failure Clinical Records Dataset before the classification process of the Random Forest algorithm.
References
[2] Han, J., Kamber, M., and Pei, J, Data Mining: Data Mining Concepts and Techniques. USA: Morgan Kaufmann, 2012, doi: 10.1109/ICMIRA.2013.45.
[3] Rohman, A., and Rochcham, M, “Model Algorithm C4.5 untuk Prediksi Penyakit Jantung,” Jurnal Neo Teknika, vol. 4, no. 2, pp. 52–55, 2018, doi: 10.37760/neoteknika.v4i2.1228.
[4] Metra, M., and Teerlink, J. R, “Heart Failure,” The Lancet, vol. 390, no. 10106, pp. 1981–1995, 2017.
[5] Purbianto, and Agustanti, D, “Analisis Faktor Risiko Gagal Jantung Di RSUD dr. H. Abdul Moeloek Provinsi Lampung,” Jurnal Keperawatan, vol. XI, no. 2, pp. 194–203, 2015.
[6] Lippi, G., and Sanchis-Gomar, F, “Global Epidemiology and Future Trends of Heart Failure,” AME Medical Journal, no. 5, vol. 15, pp. 1–6, 2020, doi: 10.21037/amj.2020.03.03.
[7] Rady, E. H. A., and Anwar, A. S, “Prediction of Kidney Disease Stages Using Data Mining Algorithms,” Informatics in Medicine Unlocked, no. 15, pp. 1–7, 2019, doi: 10.1016/j.imu.2019.100178.
[8] Korzhakin, D. A., and Sugiharti, E, “Implementation of Genetic Algorithm and Adaptive Neuro Fuzzy Inference System in Predicting Survival of Patients with Heart Failure,” Scientific Journal of Informatics, vol. 8, no. 2, pp. 251-257, 2021.
[9] Kurniawan, Y. I, “Perbandingan Algorithm Naive Bayes dan C.45 dalam Klasifikasi Data Mining,” Jurnal Teknologi Informasi Dan Ilmu Komputer, no. 5, vol. 4, pp. 455–464, 2018, doi: 10.25126/jtiik.201854803.
[10] Singh, D., Choudhary, N., and Samota, J, “Analysis of Data Mining Classification with Decision tree Technique,” Global Journal of Computer Science and Technology, vol. 13, no. 13, 2013.
[11] Kulkarni, V. Y., and Sinha, P. K, “Effective Learning and Classification Using Random Forest Algorithm,” International Journal of Engineering and Innovative Technolgy, vol. 3, no. 11, pp. 267–273, 2014.
[12] Vijiyakumar, K., Lavanya, B., Nirmala, I., and Sofia Caroline, S, “Random Forest Algorithm for The Prediction of Diabetes,” International Conference on System, Computation, Automation and Networking, (ICSCAN), pp. 1–5, 2019, doi: 10.1109/ICSCAN.2019.8878802.
[13] Edric, and Tamba, S. P, “Prediksi Penyakit Gagal Jantung dengan Menggunakan Random Forest,” Jurnal Sistem Informasi Dan Ilmu Komputer Prima (JUSIKOM PRIMA), vol. 5, no. 2, pp. 176–181, 2022.
[14] Dittman, D. J., Khoshgoftaar, T. M., and Napolitano, A, “Is Data Sampling Required When Using Random Forest for Classification on Imbalanced Bioinformatics Data,” Advances in Intelligent Systems and Computing, vol. 446, pp. 157–171, 2016, doi: 10.1007/978-3-319-31311-5_7.
[15] Sáez, J. A., Krawczyk, B., and Woźniak, M, “Analyzing The Oversampling of Different Classes and Types of Examples in Multi-class Imbalanced Datasets,” Pattern Recognition, no. 57, pp. 164–178, 2016, doi: 10.1016/j.patcog.2016.03.012.
[16] Erlin, Desnelita, Y., Nasution, N., Suryati, L., and Zoromi, F, “Dampak SMOTE terhadap Kinerja Random Forest Classifier Berdasarkan Data No Seimbang,” Matrik: Jurnal Manajemen, Teknik Informatika, Dan Rekayasa Komputer, vol. 21, no.3, pp. 677–690, 2022, doi: 10.30812/matrik.v21i3.1726.
[17] Zhu, T., Lin, Y., amd Liu, Y, “Synthetic Minority Oversampling Technique for Multiclass Imbalance Problems,” Pattern Recognition, no. 72, pp. 327–340, 2017, doi: 10.1016/j.patcog.2017.07.024.
[18] Xue, B., Zhang, M., Member, S., and Browne, W. N, “Particle Swarm Optimization for Feature Selection in Classification : A Multi-Objective Approach,” IEEE Transactions on Cybernetics, pp. 1–16, 2012.
[19] Aghdam, M. H., and Heidari, S, “Feature Selection using Particle Swarm Optimization in Text Categorization,” JAISCR, no. 5, vol. 4, pp. 231–238, 2015, doi: 10.1007/978-81-322-1985-9_2.
[20] Lin, S. W., Ying, K. C., Chen, S. C., and Lee, Z. J, “Particle Swarm Optimization for Parameter Determination and Feature Selection of Support Vector Machines,” Expert Systems with Applications, vol. 35, no. 4, pp. 1817–1824, 2008, doi: 10.1016/j.eswa.2007.08.088.
[21] Ramanda, K., and Carolina, I, “Seleksi Fitur Algorithm Neural Network Menggunakan Particle Swarm Optimization Untuk Memprediksi Kelahiran Prematur,” Kilat, vol. 6, no. 2, pp. 106–111, 2017, doi: 10.33322/kilat.v6i2.134.
[22] Lubis, M. R, “Method Hybrid Particle Swarm Optimization - Neural Network Backpropagation untuk Prediksi Hasil Pertandingan Sepak Bola,” J-SAKTI (Jurnal Sains Komputer Dan Informatika), vol. 1, no. 1, pp. 71, 2017, doi: 10.30645/j-sakti.v1i1.30.
[23] Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W, “SMOTE : Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
[24] Kennedy, J., and Eberhart, R, “Particle Swarm Optimization,” Proceedings of ICNN’95, vol. 4, pp. 1942–1948, 1995.
[25] Novaldy, F., and Herliana, A, “Penerapan PSO pada Naive Bayes untuk Prediksi Harapan Hidup Pasien Gagal Jantung,” Jurnal Responsif: Riset Sains Dan Informatika, vol. 3, no. 1, pp. 37–43, 2021, https://doi.org/10.51977/jti.v3i1.396.
[26] Jabbar, M. A., Deekshatulu, B. L., and Chandra, P, “Prediction of Heart Disease Using Random Forest and Feature Subset Selection,” Advances in Intelligent Systems and Computing, vol. 424, pp. 187–196, 2016, https://doi.org/10.1007/ 978-3-319-28031-8.