Kombinasi Metode Correlated Naive Bayes dan Metode Seleksi Fitur Wrapper untuk Klasifikasi Data Kesehatan

Hairani Hairani; Muhammad Innuddin

doi:10.15294/jte.v11i2.23693

Kombinasi Metode Correlated Naive Bayes dan Metode Seleksi Fitur Wrapper untuk Klasifikasi Data Kesehatan

Hairani Hairani⁽¹⁾, Muhammad Innuddin⁽²⁾,

DOI: https://doi.org/10.15294/jte.v11i2.23693

(1) Program Studi Ilmu Komputer, Fakultas Teknik dan Desain, Universitas Bumigora
(2) Program Studi Sistem Informasi, Fakultas Teknik dan Desain, Universitas Bumigora

Abstract

Most features of health data that have many irrelevant features can reduce the performance of classification method. One health data that has many attributes is the Pima Indian Diabetes dataset and Thyroid. Diabetes is a deadly disease caused by the increasing of blood sugar because of the body's inability to produce enough insulin and its complications can lead to heart attacks and strokes. The purpose of this research is to do a combination of Correlated NaÃ¯ve Bayes method and Wrapper-based feature selection to classification of health data. The stages of this research consist of several stages, namely; (1) the collection of Pima Indian Diabetes and Thyroid dataset from UCI Machine Learning Repository, (2) pre-processing data such as transformation, Scaling, and Wrapper-based feature selection, (3) classification using the Correlated Naive Bayes and Naive Bayes methods, and (4) performance test based on its accuracy using the 10-fold cross validation method. Based on the results, the combination of Correlated Naive Bayes method and Wrapper-based feature selection get the best accuracy for both datasets used. For Pima Indian Diabetes dataset, the accuracy is 71,4% and the Thyroid dataset accuracy is 79,38%. Thus, the combination of Correlated NaÃ¯ve Bayes method and Wrapper-based feature selection result in better accuracy without feature selection with an increase of 4,1% for Pima Indian Diabetes dataset and 0,48% for the Thyroid dataset.

Keywords

Correlated Naive Bayes; Wrapper feature selection; Pima Indian Diabetes dataset; Thyroid dataset; health data

Full Text:

PDF

References

J. D. Ãlvarez, J. A. Matias-Guiu, M. N. Cabrera-MartÃn, J. L. Risco-MartÃn, and J. L. Ayala, â€œAn application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders,â€ BMC Bioinformatics, vol. 20, no. 1, pp. 1â€“12, 2019, doi: 10.1186/s12859-019-3027-7.

D. Sisodia and D. S. Sisodia, â€œPrediction of Diabetes using Classification Algorithms,â€ Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578â€“1585, 2018, doi: 10.1016/j.procs.2018.05.122.

M. A. Fahmiin and T. H. Lim, â€œEvaluating the Effectiveness of Wrapper Feature Selection Methods with Artificial Neural Network Classifier for Diabetes Prediction,â€ in Testbeds and Research Infrastructures for the Development of Networks and Communications, 2020, pp. 3â€“17.

J. C. Ang, A. Mirzal, H. Haron, and H. N. A. Hamed, â€œSupervised, unsupervised, and semi-supervised feature selection: A review on gene selection,â€ IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 13, no. 5, pp. 971â€“989, 2016, doi: 10.1109/TCBB.2015.2478454.

N. K. Suchetha, A. Nikhil, and P. Hrudya, â€œComparing the Wrapper Feature Selection Evaluators on Twitter Sentiment Classification,â€ in 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), 2019, pp. 1â€“6, doi: 10.1109/ICCIDS.2019.8862033.

E. Hancer, B. Xue, and M. Zhang, â€œDifferential evolution for filter feature selection based on information theory and feature ranking,â€ Knowledge-Based Syst., vol. 140, pp. 103â€“119, 2018, doi: 10.1016/j.knosys.2017.10.028.

S. L. Shiva Darshan and C. D. Jaidhar, â€œPerformance Evaluation of Filter-based Feature Selection Techniques in Classifying Portable Executable Files,â€ Procedia Comput. Sci., vol. 125, pp. 346â€“356, 2018, doi: 10.1016/j.procs.2017.12.046.

M. Alirezanejad, R. Enayatifar, H. Motameni, and H. Nematzadeh, â€œHeuristic filter feature selection methods for medical datasets,â€ Genomics, vol. 112, no. 2, pp. 1173â€“1181, 2020, doi: 10.1016/j.ygeno.2019.07.002.

H. Zhou, X. Wang, and Y. Zhang, â€œFeature selection based on weighted conditional mutual information,â€ Appl. Comput. Informatics, no. xxxx, 2020, doi: 10.1016/j.aci.2019.12.003.

C. Liu, W. Wang, Q. Zhao, X. Shen, and M. Konan, â€œA new feature selection method based on a validity index of feature subset,â€ Pattern Recognit. Lett., vol. 92, pp. 1â€“8, 2017, doi: 10.1016/j.patrec.2017.03.018.

S. S. Hameed, O. O. Petinrin, A. O. Hashi, and F. Saeed, â€œFilter-wrapper combination and embedded feature selection for gene expression data,â€ Int. J. Adv. Soft Comput. its Appl., vol. 10, no. 1, pp. 90â€“105, 2018.

B. A. Muktamar, N. A. Setiawan, and T. B. Adji, â€œPembobotan Korelasi pada Naive Bayes Classifier,â€ Semin. Nas. Teknol. Inf. dan Multimed. 2015 STMIK AMIKOM Yogyakarta, 6-8 Februari 2015, no. 1, pp. 43â€“47, 2015.

H. Hairani, G. Nugraha, M. Nurkholis Abdillah, and M. Innuddin, â€œKomparasi Akurasi Metode Correlated Naive Bayes Classifier dan Naive Bayes Classifier untuk Diagnosis Penyakit Diabetes,â€ InfoTekJar (Jurnal Nasional Informatika dan Teknologi Jaringan), vol. 3, no. 1, pp. 6â€“11, 2018, doi: 10.30743/infotekjar.v3i1.558

H. Hairani, K. E. Saputro, and S. Fadli, â€œK-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes,â€ Jurnal Teknologi. dan Sistem Komputer, vol. 8, no. 2, pp. 89â€“93, Apr. 2020, doi: https://doi.org/10.14710/jtsiskom.8.2.2020.89-93.

Hairani, M. N. Abdillah, and M. Innuddin, â€œAn Expert System for Diagnosis of Rheumatic Disease Types Using Forward Chaining Inference and Certainty Factor Method,â€ in 2019 International Conference on Sustainable Information Engineering and Technology (SIET), 2019, pp. 104â€“109, doi: 10.1109/SIET48054.2019.8986035.

S. H. A. Aini, Y. A. Sari, and A. Arwan, â€œSeleksi Fitur Information Gain untuk Klasifikasi Penyakit Jantung Menggunakan Kombinasi Metode K-Nearest Neighbor dan NaÃ¯ve Bayes,â€ J. Pengemb. Teknol. Inf. dan Ilmu Komputer; Vol 2 No 9, vol. 2, no. 9, pp. 2546â€“2554, Feb. 2018.

H. Zheng, H. W. Park, D. Li, K. H. Park, and K. H. Ryu, â€œA Hybrid Feature Selection Approach for Applying to Patients with Diabetes Mellitus : KNHANES,â€ in 2018 5th NAFOSTED Conference on Information and Computer Science (NICS), 2018, pp. 110â€“113.

F. Kayaalp, M. S. Basarslan, and K. Polat, â€œA hybrid classification example in describing chronic kidney disease,â€ in 2018 Electric Electronics, Computer Science, Biomedical Engineeringsâ€™ Meeting (EBBT), 2018, pp. 1â€“4, doi: 10.1109/EBBT.2018.8391444.

N. El Aboudi and L. Benhlima, â€œReview on wrapper feature selection approaches,â€ in 2016 International Conference on Engineering & MIS (ICEMIS), 2016, pp. 1â€“5, doi: 10.1109/ICEMIS.2016.7745366.

S. Manikandan, E. Susi, and S. Abirami, â€œFeature Selection on High Dimensional Data using Wrapper Based Subset Selection,â€ in 2017 Second International Conference on Recent Trend and Challenges in Computational Models, 2017, pp. 320â€“325, doi: 10.1109/ICRTCCM.2017.58.

O. Somantri and M. Khambali, â€œFeature Selection Klasifikasi Kategori Cerita Pendek Menggunakan NaÃ¯ve Bayes dan Algoritme Genetika,â€ J. Nas. Tek. Elektro dan Teknol. Inf., vol. 6, no. 3, pp. 301â€“306, 2017, doi: 10.22146/jnteti.v6i3.332.

I. Santoso, W. Gata, and A. B. Paryanti, â€œPenggunaan Feature Selection di Algoritma Support Vector Machine untuk Sentimen Analisis Komisi Pemilihan Umum,â€ Rekayasa Sist. dan Teknol. Inf., vol. 3, no. 3, pp. 364â€“370, 2019.

R. Nair and A. Bhagat, â€œFeature selection method to improve the accuracy of classification algorithm,â€ Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 6, pp. 124â€“127, 2019.

H. Marcos and H. Utomo, â€œPerbandingan Kinerja Algoritme C.45 Dan Naive Bayes Mengklasifikasi Penyakit Diabetes,â€ J. Inform., vol. 15, no. 2, pp. 141â€“148, 2015, doi: 10.30873/ji.v15i2.596.

B. A. Muktamar, N. A. Setiawan, and T. B. Adji, â€œAnalisis Perbandingan Tingkat AKurasi Algoritma Naive Bayes Classifier dengan Correlated-Naive Bayes Classifier,â€ Semin. Nas. Teknol. Inf. dan Multimed. 2015 STMIK AMIKOM Yogyakarta, 6-8 Februari 2015, pp. 49â€“54, 2015.

Z. Ulhaq and T. B. Adji, â€œTechnique ( SMOTE ) dengan Correlated NaÃ¯ve Bayes pada Klasifikasi Siswa Berkesulitan Belajar,â€ in CITEE, 2017, pp. 201â€“205.

Refbacks

There are currently no refbacks.

Address:

Gedung E11 Lantai 1, Jurusan Teknik Elektro, Fakultas Teknik, Universitas Negeri Semarang, Kampus Sekaran, Gunungpati, Semarang, Jawa Tengah, Indonesia, 50229.

Telp.: +62248508104

Email: [email protected]