Comparison of Probabilistic Neural Network (PNN) and k-Nearest Neighbor (k-NN) Algorithms for Diabetes Classification

  • Diah Siti Fatimah Azzahrah Universitas Negeri Semarang
  • Alamsyah Alamsyah Universitas Negeri Semarang
Keywords: Data Mining, Probabilistic Neural Network, k-Nearest Neighbor, Feature Selection, K-Fold Cross Validation

Abstract

Purpose: This study aims to compare algorithms to determine the accuracy of the algorithm and determine the speed of the algorithm used for diabetes classification.

Methods: There are two algorithms used in this study, namely Probabilistic Neural Network (PNN) and k-Nearest Neighbor (k-NN). The data used is the Pima Indians Diabetes Database. The data contains 768 data with 8 attributes and 1 target class, namely 0 for no diabetes and 1 for diabetes. The dataset has been divided into 80% training data and 20% testing data.

Result: Accuracy is obtained after implementing k-fold cross validation with a value of k = 4. The accuracy results show that the k-Nearest Neighbor algorithm is superior and has better quickness compared to the Probabilistic Neural Network. The k-Nearest Neighbor algorithm obtains an accuracy of 74.6% for all features and 78.1% for four features

Novelty: The novelty of this paper is optimizing and improving accuracy which is implemented with by focusing on data preprocessing, feature selection and k-fold cross validation in the classification algorithm

References

[1] M. Maniruzzaman, M. J. Rahman, B. Ahammed, and M. M. Abedin, “Classification and Prediction of Diabetes Disease using Machine Learning Paradigm,” Heal. Inf. Sci. Syst., vol. 8, no. 7, pp. 1–14, 2020, doi: 10.1007/s13755-019-0095-z.
[2] A. Viloria, Y. Herazo-Beltran, D. Cabrera, and O. B. Pineda, “Diabetes Diagnostic Prediction Using Vector Support Machines,” in Procedia Computer Science, 2020, vol. 170, pp. 376–381. doi: 10.1016/j.procs.2020.03.065.
[3] N. P. Tigga and S. Garg, “Prediction of Type 2 Diabetes using Machine Learning Classification Methods,” in Procedia Computer Science, 2020, vol. 167, pp. 706–716. doi: 10.1016/j.procs.2020.03.336.
[4] P. Rajendra and S. Latifi, “Prediction of Diabetes using Logistic Regression and Ensemble Techniques,” Comput. Methods Programs Biomed. Updat., vol. 1, no. 100032, pp. 1–8, 2021, doi: 10.1016/j.cmpbup.2021.100032.
[5] F. Anwar, Qurat-UI-Ain, M. Y. Ejaz, and A. Mosavi, “A comparative analysis on diagnosis of diabetes mellitus using different approaches – A survey,” Informatics Med. Unlocked, vol. 21, no. 100482, pp. 1–10, 2020, doi: 10.1016/j.imu.2020.100482.
[6] D. S. F. Azzahrah and Alamsyah, “Klasifikasi Penyakit Diabetes Menggunakan Algoritma K-Nearest Neighbor,” in Seminar Nasional Ilmu Komputer (SNIK 2022), 2022, pp. 70–75. [Online]. Available: https://conf.unnes.ac.id/index.php/snik/snik2022/paper/view/610/499
[7] J. L. Handarko and Alamsyah, “Implementasi Fuzzy Decision Tree Untuk Mendiagnosa Penyakit Hepatitis,” Unnes J. Math., vol. 4, no. 2, pp. 157–164, 2015.
[8] H. A. Prihanditya and Alamsyah, “The Implementation of Z-Score Normalization and Boosting Techniques to Increase Accuracy of C4.5 Algorithm in Diagnosing Chronic Kidney Disease,” J. Soft Comput. Explor., vol. 1, no. 1, pp. 63–69, 2020, doi: 10.52465/joscex.v1i1.8.
[9] S. M. Birjandi and S. H. Khasteh, “A Survey on Data Mining Techniques used in Medicine,” J. Diabetes Metab. Disord., vol. 20, pp. 2055–2071, 2021, doi: 10.1007/s40200-021-00884-2.
[10] I. Yoo et al., “Data Mining in Healthcare and Biomedicine: A Survey of the Literature,” J. Med. Syst., vol. 36, pp. 2431–2448, 2012, doi: 10.1007/s10916-011-9710-5.
[11] Z. S. Hikmawati, R. Arifudin, and A. Alamsyah, “Prediction The Number of Dengue Hemorrhagic Fever Patients Using Fuzzy Tsukamoto Method at Public Health Service of Purbalingga,” Sci. J. Informatics, vol. 4, no. 2, pp. 115–124, 2017, doi: 10.15294/sji.v4i2.10342.
[12] M. A. Sarwar, N. Kamal, W. Hamid, and M. A. Shah, “Prediction of Diabetes Using Machine Learning Algorithms in Healthcare,” in 2018 24th International Conference on Automation and Computing (ICAC), 2018, pp. 1–6. doi: 10.23919/IConAC.2018.8748992.
[13] A. Al-Zebari and A. Sengur, “Performance Comparison of Machine Learning Techniques on Diabetes Disease Detection,” 2019 1st Int. Informatics Softw. Eng. Conf., pp. 1–4, 2019, doi: 10.1109/UBMYK48245.2019.8965542.
[14] D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,” Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578–1585, 2018, doi: 10.1016/j.procs.2018.05.122.
[15] A. Mir and S. N. Dhage, “Diabetes Disease Prediction Using Machine Learning on Big Data of Healthcare,” in 2018 4th International Conference on Computing, Communication Control and Automation (ICCUBEA), 2018, pp. 1–6. doi: 10.1109/ICCUBEA.2018.8697439.
[16] V. Chang, J. Bailey, Q. A. Xu, and Z. Sun, “Pima Indians Diabetes Mellitus Classification Based on Machine Learning (ML) Algorithms,” Neural Comput. Appl., 2022, doi: 10.1007/s00521-022-07049-z.
[17] L. M. Raposo, M. B. Arruda, R. M. de Brindeiro, and F. F. Nobre, “Lopinavir Resistance Classification with Imbalanced Data Using Probabilistic Neural Networks,” J. Med. Syst., vol. 40, no. 69, pp. 1–7, 2016, doi: 10.1007/s10916-015-0428-7.
[18] S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative Performance Analysis of K-Nearest Neighbour (KNN) Algorithm and its Different Variants for Disease Prediction,” Sci. Rep., vol. 12, no. 6256, pp. 1–11, 2022, doi: 10.1038/s41598-022-10358-x.
[19] U. M. Learning, “Pima Indians Diabetes Database,” Kaggle. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
[20] H. Sulastri and A. I. Gufroni, “Penerapan Data Mining Dalam Pengelompokan Penderita Thalassaemia,” J. Nas. Teknol. dan Sist. Inf., vol. 3, no. 2, pp. 299–305, 2017, doi: 10.25077/teknosi.v3i2.2017.299-305.
[21] R. R. Rerung, “Penerapan Data Mining dengan Memanfaatkan Metode Association Rule untuk Promosi Produk,” J. Teknol. Rekayasa, vol. 3, no. 1, p. 89, 2018, doi: 10.31544/jtera.v3.i1.2018.89-98.
[22] F. F. Firdaus, H. A. Nugroho, and I. Soesanti, “A Review of Feature Selection and Classification Approaches for Heart Disease Prediction,” IJITEE (International J. Inf. Technol. Electr. Eng., vol. 4, no. 3, pp. 75–82, 2020, doi: 10.22146/ijitee.59193.
[23] I. A. Nikmatun and I. Waspada, “Implementasi Data Mining untuk Klasifikasi Masa Studi Mahasiswa Menggunakan Algoritma K-Nearest Neighbor,” J. SIMETRIS, vol. 10, no. 2, pp. 421–432, 2019, [Online]. Available: https://jurnal.umk.ac.id/index.php/simet/article/view/2882/1855
[24] A. Gholamy, V. Kreinovich, and O. Kosheleva, “Why 70/30 or 80/20 Relation Between Training and Testing Sets : A Pedagogical Explanation,” Dep. Tech. Reports, pp. 1–6, 2018.
[25] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San Fransisco: Morgan Kaufmann, 2012. [Online]. Available: https://doi.org/10.1016/C2009-0-61819-5
[26] M. S. Bascil and H. Oztekin, “A Study on Hepatitis Disease Diagnosis Using Probabilistic Neural Network,” J. Med. Syst., vol. 36, pp. 1603–1606, 2012, doi: 10.1007/s10916-010-9621-x.
[27] S. J. Siregar, A. I. Lubis, and E. F. Ginting, “Penerapan Neural Network Dalam Klasifikasi Citra Permainan Batu Kertas Gunting dengan Probabilistic Neural Network,” Build. Informatics, Technol. Sci., vol. 3, no. 3, pp. 420–425, 2021, doi: 10.47065/bits.v3i3.1143.
[28] A. Giri, M. V. V. Bhagavath, B. Pruthvi, and N. Dubey, “A Placement Prediction System using K-Nearest Neighbors Classifier,” in 2016 Second International Conference on Cognitive Computing and Information Processing (CCIP), 2016, pp. 1–4. doi: 10.1109/CCIP.2016.7802883.
[29] N. Yilmaz, O. Inan, and M. S. Uzer, “A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases,” J. Med. Syst., vol. 38, no. 48, pp. 1–12, 2014, doi: 10.1007/s10916-014-0048-7.
Published
2023-09-29
How to Cite
Azzahrah, D. S., & Alamsyah, A. (2023). Comparison of Probabilistic Neural Network (PNN) and k-Nearest Neighbor (k-NN) Algorithms for Diabetes Classification. Recursive Journal of Informatics, 1(2), 73-82. https://doi.org/10.15294/rji.v1i2.66078

Most read articles by the same author(s)