Application of Discretization and Information Gain on Naïve Bayes to Diagnose Heart Disease
Abstract
In the health sector, there is a lot of data that can be processed and utilized. Current technology can be used to process data and produce predictions or diagnosis of disease. To diagnose the disease, it is necessary to have a patient medical record or health data which have collected in the past. In the process of processing the data requires a method that is called data mining. In data mining, some methods can be used for example classification. One of the algorithms found in the classification method is the Naïve Bayes algorithm. Naïve Bayes is an algorithm of classification method that is often used. The improvement of the accuracy of Naïve Bayes algorithms can be done by using discretization and information gain. The purpose of this study was to determine the application of discretization and information gain in heart disease datasets. The data used in this study are datasets of heart disease obtained from the UCI repository of machine learning consisting of 270 instances and 14 features. In this study, the mining process uses k-fold cross-validation with a value of k = 10. The results of the application of the Naïve Bayes algorithm classification obtained an accuracy of 85.1852% while the accuracy of the Naïve Bayes algorithm with discretization and information gain accuracy increased to 85.5556%. The enhancement of accuracy is obtained from the removal of scales performed using information gain and discretization techniques on Naïve Bayes algorithms with an increase of 0.3704% compared with the accuracy of the Naïve Bayes algorithm.