Kombinasi Metode Correlated Naive Bayes dan Metode Seleksi Fitur Wrapper untuk Klasifikasi Data Kesehatan

Hairani Hairani(1), Muhammad Innuddin(2),


(1) Program Studi Ilmu Komputer, Fakultas Teknik dan Desain, Universitas Bumigora
(2) Program Studi Sistem Informasi, Fakultas Teknik dan Desain, Universitas Bumigora

Abstract

Most features of health data that have many irrelevant features can reduce the performance of classification method. One health data that has many attributes is the Pima Indian Diabetes dataset and Thyroid. Diabetes is a deadly disease caused by the increasing of blood sugar because of the body's inability to produce enough insulin and its complications can lead to heart attacks and strokes. The purpose of this research is to do a combination of Correlated Naïve Bayes method and Wrapper-based feature selection to classification of health data. The stages of this research consist of several stages, namely; (1) the collection of Pima Indian Diabetes and Thyroid dataset from UCI Machine Learning Repository, (2) pre-processing data such as transformation, Scaling, and Wrapper-based feature selection, (3) classification using the Correlated Naive Bayes and Naive Bayes methods, and (4) performance test based on its accuracy using the 10-fold cross validation method. Based on the results, the combination of Correlated Naive Bayes method and Wrapper-based feature selection get the best accuracy for both datasets used. For Pima Indian Diabetes dataset, the accuracy is 71,4% and the Thyroid dataset accuracy is 79,38%. Thus, the combination of Correlated Naïve Bayes method and Wrapper-based feature selection result in better accuracy without feature selection with an increase of 4,1% for Pima Indian Diabetes dataset and 0,48% for the Thyroid dataset.

Keywords

Correlated Naive Bayes; Wrapper feature selection; Pima Indian Diabetes dataset; Thyroid dataset; health data

Full Text:

PDF

References

J. D. Ãlvarez, J. A. Matias-Guiu, M. N. Cabrera-Martín, J. L. Risco-Martín, and J. L. Ayala, “An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders,†BMC Bioinformatics, vol. 20, no. 1, pp. 1–12, 2019, doi: 10.1186/s12859-019-3027-7.

D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,†Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578–1585, 2018, doi: 10.1016/j.procs.2018.05.122.

M. A. Fahmiin and T. H. Lim, “Evaluating the Effectiveness of Wrapper Feature Selection Methods with Artificial Neural Network Classifier for Diabetes Prediction,†in Testbeds and Research Infrastructures for the Development of Networks and Communications, 2020, pp. 3–17.

J. C. Ang, A. Mirzal, H. Haron, and H. N. A. Hamed, “Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection,†IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 13, no. 5, pp. 971–989, 2016, doi: 10.1109/TCBB.2015.2478454.

N. K. Suchetha, A. Nikhil, and P. Hrudya, “Comparing the Wrapper Feature Selection Evaluators on Twitter Sentiment Classification,†in 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), 2019, pp. 1–6, doi: 10.1109/ICCIDS.2019.8862033.

E. Hancer, B. Xue, and M. Zhang, “Differential evolution for filter feature selection based on information theory and feature ranking,†Knowledge-Based Syst., vol. 140, pp. 103–119, 2018, doi: 10.1016/j.knosys.2017.10.028.

S. L. Shiva Darshan and C. D. Jaidhar, “Performance Evaluation of Filter-based Feature Selection Techniques in Classifying Portable Executable Files,†Procedia Comput. Sci., vol. 125, pp. 346–356, 2018, doi: 10.1016/j.procs.2017.12.046.

M. Alirezanejad, R. Enayatifar, H. Motameni, and H. Nematzadeh, “Heuristic filter feature selection methods for medical datasets,†Genomics, vol. 112, no. 2, pp. 1173–1181, 2020, doi: 10.1016/j.ygeno.2019.07.002.

H. Zhou, X. Wang, and Y. Zhang, “Feature selection based on weighted conditional mutual information,†Appl. Comput. Informatics, no. xxxx, 2020, doi: 10.1016/j.aci.2019.12.003.

C. Liu, W. Wang, Q. Zhao, X. Shen, and M. Konan, “A new feature selection method based on a validity index of feature subset,†Pattern Recognit. Lett., vol. 92, pp. 1–8, 2017, doi: 10.1016/j.patrec.2017.03.018.

S. S. Hameed, O. O. Petinrin, A. O. Hashi, and F. Saeed, “Filter-wrapper combination and embedded feature selection for gene expression data,†Int. J. Adv. Soft Comput. its Appl., vol. 10, no. 1, pp. 90–105, 2018.

B. A. Muktamar, N. A. Setiawan, and T. B. Adji, “Pembobotan Korelasi pada Naive Bayes Classifier,†Semin. Nas. Teknol. Inf. dan Multimed. 2015 STMIK AMIKOM Yogyakarta, 6-8 Februari 2015, no. 1, pp. 43–47, 2015.

H. Hairani, G. Nugraha, M. Nurkholis Abdillah, and M. Innuddin, “Komparasi Akurasi Metode Correlated Naive Bayes Classifier dan Naive Bayes Classifier untuk Diagnosis Penyakit Diabetes,†InfoTekJar (Jurnal Nasional Informatika dan Teknologi Jaringan), vol. 3, no. 1, pp. 6–11, 2018, doi: 10.30743/infotekjar.v3i1.558

H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes,†Jurnal Teknologi. dan Sistem Komputer, vol. 8, no. 2, pp. 89–93, Apr. 2020, doi: https://doi.org/10.14710/jtsiskom.8.2.2020.89-93.

Hairani, M. N. Abdillah, and M. Innuddin, “An Expert System for Diagnosis of Rheumatic Disease Types Using Forward Chaining Inference and Certainty Factor Method,†in 2019 International Conference on Sustainable Information Engineering and Technology (SIET), 2019, pp. 104–109, doi: 10.1109/SIET48054.2019.8986035.

S. H. A. Aini, Y. A. Sari, and A. Arwan, “Seleksi Fitur Information Gain untuk Klasifikasi Penyakit Jantung Menggunakan Kombinasi Metode K-Nearest Neighbor dan Naïve Bayes,†J. Pengemb. Teknol. Inf. dan Ilmu Komputer; Vol 2 No 9, vol. 2, no. 9, pp. 2546–2554, Feb. 2018.

H. Zheng, H. W. Park, D. Li, K. H. Park, and K. H. Ryu, “A Hybrid Feature Selection Approach for Applying to Patients with Diabetes Mellitus : KNHANES,†in 2018 5th NAFOSTED Conference on Information and Computer Science (NICS), 2018, pp. 110–113.

F. Kayaalp, M. S. Basarslan, and K. Polat, “A hybrid classification example in describing chronic kidney disease,†in 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), 2018, pp. 1–4, doi: 10.1109/EBBT.2018.8391444.

N. El Aboudi and L. Benhlima, “Review on wrapper feature selection approaches,†in 2016 International Conference on Engineering & MIS (ICEMIS), 2016, pp. 1–5, doi: 10.1109/ICEMIS.2016.7745366.

S. Manikandan, E. Susi, and S. Abirami, “Feature Selection on High Dimensional Data using Wrapper Based Subset Selection,†in 2017 Second International Conference on Recent Trend and Challenges in Computational Models, 2017, pp. 320–325, doi: 10.1109/ICRTCCM.2017.58.

O. Somantri and M. Khambali, “Feature Selection Klasifikasi Kategori Cerita Pendek Menggunakan Naïve Bayes dan Algoritme Genetika,†J. Nas. Tek. Elektro dan Teknol. Inf., vol. 6, no. 3, pp. 301–306, 2017, doi: 10.22146/jnteti.v6i3.332.

I. Santoso, W. Gata, and A. B. Paryanti, “Penggunaan Feature Selection di Algoritma Support Vector Machine untuk Sentimen Analisis Komisi Pemilihan Umum,†Rekayasa Sist. dan Teknol. Inf., vol. 3, no. 3, pp. 364–370, 2019.

R. Nair and A. Bhagat, “Feature selection method to improve the accuracy of classification algorithm,†Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 6, pp. 124–127, 2019.

H. Marcos and H. Utomo, “Perbandingan Kinerja Algoritme C.45 Dan Naive Bayes Mengklasifikasi Penyakit Diabetes,†J. Inform., vol. 15, no. 2, pp. 141–148, 2015, doi: 10.30873/ji.v15i2.596.

B. A. Muktamar, N. A. Setiawan, and T. B. Adji, “Analisis Perbandingan Tingkat AKurasi Algoritma Naive Bayes Classifier dengan Correlated-Naive Bayes Classifier,†Semin. Nas. Teknol. Inf. dan Multimed. 2015 STMIK AMIKOM Yogyakarta, 6-8 Februari 2015, pp. 49–54, 2015.

Z. Ulhaq and T. B. Adji, “Technique ( SMOTE ) dengan Correlated Naïve Bayes pada Klasifikasi Siswa Berkesulitan Belajar,†in CITEE, 2017, pp. 201–205.

Refbacks

  • There are currently no refbacks.