K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor

Health is a human right and one of the elements of welfare that must be realized in the form of giving various health efforts to all the people of Indonesia. Poverty in Indonesia has become a national problem and even the government seeks efforts to alleviate poverty. For example, poor families have relatively low levels of livelihood and health. One of the new policies of the Sakti Government Card Program issued by the government includes three cards, namely Indonesia Smart Card (KIP), Healthy Indonesia Card (KIS) and Prosperous Family Card (KKS). In this study to determine the feasibility of a healthy Indonesian card (KIS) required a method of optimal accuracy. The data used in this study is KIS data which amounts to 200 data records with 15 determinants of feasibility in 2017 taken at the Social Service of Pekalongan Regency. The data were processed using the K-Nearest Neighbor algorithm and the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm. This can be seen from the accuracy of determining the feasibility of K-Nearest Neighbor algorithm of 64%, while the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm is 96%, so the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm is the optimal algorithm in determining the feasibility of healthy Indonesian card recipients with an increase of 32% accuracy. This study shows that the accuracy of the results of determining feasibility using a combination of K-Nearest Neighbor-Naive Bayes Classifier algorithms is better than the K-Nearest Neighbor algorithm.


INTRODUCTION
Indonesia is one of the most populous countries in Asia. Economic growth negatively affects poverty in Indonesia. The unemployment rate has a positive effect on poverty in Indonesia. Government spending on poverty alleviation has no effect on poverty in Indonesia. Government policy should encourage economic growth, where high economic growth can increase national income and will directly increase per capita income of every resident [1].
Poverty is a deficiency situation that happens and is not desired by everyone [2]. Poverty can infect every level and sphere of life, from individual to state level [3]. For example, poor families have a relatively low level of livelihood and health compared to people whose lives are sufficient [4]. There are various factors that cause poor households, among others, the termination of employment from the office or company for those who were previously employed to become unemployed and have no income, low education and no skills so difficult to find work, the change of poor criteria from the Central Bureau of Statistics (BPS) [5]. Poverty is multidimensional because the human needs are diverse, seen from the aspect of primary and secondary aspects [6].
There are nine points Nawacita Joko Widodo, of the nine there is nothing specifically related to the field of health. But as long as political action is able to draw attention from the public that is considered spectacular health package KIS. Health is a human right and one of the elements of welfare that must be realized in the form of giving various health efforts to all the people of Indonesia through the implementation of development of quality health and affordable by the community. Development in the health sector is directed towards achieving awareness, willingness and ability to live healthy for every resident [7]. The Sakti Card Program is intended for the underprivileged and underprivileged Indonesians [8]. Some people think that the launching of three magic cards is politically charged. It coincides with the policy plan of fuel price hike (BBM). Others support the program [9]. In general, the problems that will arise in the field are related to the target or category of KIS recipients.
Data mining is a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and related knowledge from large databases [10,11]. Activities that include collection and use of historical data to find regularities, patterns or relationships in large data sets [12]. Classification is a new record of data to one of several predefined categories (or classes). Also called supervised learning [13]. The task of classification is to map data into class groups [14]. Data mining classification techniques can be used to determine feasible or not feasible to get a Healthy Indonesia Card. The outputs generated by the data mining classification can be used for knowledge. The classification of community data plays a role to determine who is feasible and who is not objectively and accurately. One of the methods to be used is with data mining.
According to research [15], text classification obtained results of accuracy for the use of Naive Bayes Classifier method 86.7%, and K-Nearest Neighbor (KNN) 87.57%. The combination of Decision Tree and Naive Bayes Classifier is used to overcome the difficulties of continuous attributes, missing attribute values and noise (noise) in the training process. The test results achieved high detection rates and significantly reduced FP (False Positives) for different types of disorders [16]. In 2013 the combination of Naive Bayes Classifier and K-Nearest Neighbor to predict the 12 positions of profitability of financial institutions in Bangladesh Country [17]. The classification algorithm combines several algorithms conducted in 2004, combining Bayesian network algorithms and K-Nearest Neighbors for data analysis, predicting cancer class classes into three DNA microarray datasets namely Colon, Leukemia and NCI-60 [18].
Based on the background, the purpose of this study is to obtain a model and compare the accuracy results using the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm.

Data Collection
The data used in this study was taken randomly through the data of the healthy card Indonesia cardee feasibility in 2017 at the Office of Social Affairs of Pekalongan Regency. The amount of data taken as many as 200 records, consisting of 119 records as feasible as a determinant of the community meets the criteria for receiving KIS and 81 records that are not feasible to show that the community does not meet the criteria for receiving KIS.

Data Processing
In data processing, the process of grouping data to determine the variables to be used, performing data representation into numerical form and doing data sharing into training data and test data [19].

The Algoritm Used
In this research will be done comparative analysis using two classification algorithm from data mining. The proposed algorithm is a combination of K-Nearest Neighbor algorithm with Naive Bayes Classifier algorithm, then evaluate and validate the result with confusion matrix. The next stage is to compare the results of accuracy and time complexity of each algorithm, to obtain the model of the classification algorithm which obtains the highest accuracy and time complexity.
The combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm is done by finding the probability value of each attribute data to be classified on each attribute ( ), then the data having a greater probability of α will be tested using K-Nearest Neighbor algorithm . Calculate D (x, y) with the K-Nearest Neighbor algorithm for each stored data. The last step determines the order of the minimum value of D (x, y) on the calculation result. The data input comes from the trainer data then the expected output is the result of the prediction based on the closest distance to the K-Nearest Neighbor algorithm. The combination of these two methods is useful for accelerating the performance of the K-Nearest Neighbor algorithm so it is not necessary to calculate the overall data, but to calculate from the probability possible. Flowchart the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm, as shown in Figure 1.

RESULT AND DISCUSSION
The software used in this research is Matlab R2013b. By utilizing Matlab, can be done data analysis, algorithm development and create models and applications and can also be made visual display of a program so that it can facilitate the user.
This study uses the dataset of healthy card recipients obtained from the Office of Social Affairs of Pekalongan Regency. The data contained 200 data records, had 15 attributes and 1 class attribute. These attributes include age, floor area of the building, type of building floor, type of wall, defecation facility, drinking water source, main household lighting source, daily cooking fuel, meat/chicken/dairy per-week, daily feeding frequency for each ART, the ability to buy new clothes for each ART within one year, the ability to pay for medical treatment at the Puskesmas/Polyclinic, household head's income, highest education of household head, asset/saving [20]. In the class attribute has two values that are feasible and not feasible. The data sharing in this research is 75% for the process with a number of 150 records of training data and 25% for the test process using test data with 50 data records.
Steps to facilitate the mining process in the system then attributes that have category type represented in numeric form 1 and 0, that is 1 for Yes and 0 for not. While class attributes are also represented in numerical form 1 and 0, ie 1 for proper and 0 for improper. Data ready for mining process can be seen in Table 1.

Mining process on K-Nearest Neighbor algorithm
The advantage of applying the K-Nearest Neighbor algorithm is that the training process runs faster and is more flexible because it is based on the proximity of existing training data [21,22]. First, attribute grouping is done based on the classification of the feasibility of discrete data and continuous data.
After going through the calculation process using the K-Nearest Neighbor algorithm by using the Euclidean distance squared, then enter the value of [23,24,25] the data in the process to get the value of accuracy and execution time. The model obtained from K-Nearest Neighbor algorithm method is then tested using 75% training data and 25% test data. Obtained table confussion matrix as shown in Table 2.  16 7 The accuracy value is the proportion of the correct number of predictions. Can be calculated using Equation 1 [26]: ( ) The accuracy of the total test data that is correctly classified can be calculated based on the calculation formula of measurement accuracy obtained:

The mining process of combination K-Nearest Neighbor-Naive Bayes
Classifier algorithm The Naive Bayes Classifier algorithm is a statistical classification method based on the bayes theorem [27]. The Naive Bayes Classifier algorithm model has a very minimum error rate [28] and is known for its simple, fast, and highly accurate calculations [29]. Use of Naive Bayes Classifier would be better if more training data. Required training data as precise as possible and the result will be better [30].
Mining process by applying the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm, firstly calculated using Naive Bayes Classifier algorithm and then proceed with K-Nearest Neighbor algorithm.
The data sharing in this research is 75% for the process with a number of 150 records of training data and 25% for the test process using test data with 50 data records. After the data-sharing process, separate discrete data and continuous data then calculate the mean and standard deviation values of continuous data, the first thing to do is to determine the mean value or mean and standard deviation of the feasible class and not feasible classes in each attribute , ie age, income of head of household, asset / saving ownership. Table 3 shows the mean (μ) and standard deviation (S) results for each feasible and not feasible class of the three attributes. The result of calculating the mean value (μ) and the standard deviation (S) of each attribute that has the continuous data type can be seen in Table 2, then the determination of the healthy card recipients will be calculated using the Naive Bayes Classifier method with the Gauss dentity formula for attributes which has a continuous type, whereas for data the category type is calculated the probability of occurrence of each value for a variable that has category type [22]. Once the probability value of each attribute is known, the next step is to choose a probability value that has a value greater than alpha, for example the alpha value = 0.30. The purpose of choosing the probability value of each attribute is greater than the alpha that is used for later calculations on the K-Nearest Neighbor algorithm. The probability value of each attribute can be seen in Table 4. Attributes that have a greater probability value than alpha are sorted then the next step is to calculate with K-Nearest Neighbor algorithm for each data. The calculation formula of K-Nearest Neighbor algorithm using Euclid distance square has Equation 2 as follows: Steps to calculate the K-Nearest Neighbor algorithm. Specifies the parameter of value n (the closest number of neighbors), given the value parameter n=4. Calculate the distance between the new data and all data in the training data. For example, the Euclid distance square is used from the distance between the new data and all data in the training data can be seen in Table 5. Then sort the records that have the smallest Euclid distance based on the minimum distance to-n. Once sorted by the minimum distance to-n then determine using the category of K-Nearest Neighbor the most majority.
After going through the calculation process then the data in the process to get the value of accuracy and execution time using confussion matrix. Obtained table confussion matrix as shown in Table 6. The accuracy of the total test data that is correctly classified can be calculated based on the calculation formula of measurement accuracy obtained: From the results obtained, there is an increase of accuracy and execution time of the K-Nearest Neighbor combination algorithm with Naive Bayes Classifier which can be seen in Table 7. By applying the Naive Bayes Classifier algorithm to K-Nearest Neighbor it is evident that the Naive Bayes Classifier is an algorithm to improve the accuracy of the K-Nearest Neighbor algorithm in determining the feasibility of healthy Indonesian card recipients; the Naive Bayes Classifier algorithm aims to minimize variation within an attribute to obtain accurate higher than the K-Nearest Neighbor algorithm alone. For further research is expected to classify more than 2 types of classes and added the number of attributes determining the feasibility in order to get a higher level of accuracy.

CONCLUSION
The combination algorithm can overcome the weakness of K-Nearest Neighbor algorithm with faster time process and weakness in Naive Bayes Classifier with higher accuracy percentage. The accuracy of the K-Nearest Neighbor algorithm is 64%, while the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm produces 96% accuracy and execution time at KNN 0,01428 second after using the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm 0,00118 second so that after applying the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm in determining the classification of KIS for the poor has an accuracy increase of 32% and has an increase in execution time of 0,0131 second.