High School Major Classification towards University Students Variable of Score Using Naïve Bayes Algorithm

Completeness of data in each institution, such as major in a university, is necessary. Data of former school has important role in the need of students data. However, there is no relationship between data of former school and variable of students’ score. The suitable classification used in this research is data mining technique which is naïve bayes algorithm. This algorithm is able to manage massive data with a relative fast timing. By using this algorithm, the data results 64.77% performances in classifying former major in school towards variable of score. Hence, the researchers optimize selection feature by using Backward Elimination and result 71.71% performances data. It concludes that performance increases with selection feature. The increasing shows that not all variable of score affects the former school major. Keyword: Naïve Bayes, Classification, Backward Elimination


INTRODUCTION
The utilization of data must be maxmimalized well in the area of education.One of many ways is to know and to learn the marks that are influencing students performance [1].The mark, which is the background or the high school, is the important tool because it has the impact in the course grade [2].To learn and to manage the marks is necessary towards the course grade.The challenges of this case are analysing the students performance, noticing the unique mark from each student, and having the strategy and behaviour in the near future [3].
Students data about their former school, high school, can be used to know how efficience the influence of students former school with the variable of students grade.Those data are processed and one pattern is finally found.The pattern is a saving mode using introduction pattern technique called data mining [4].Data mining is a process to gain information from data saved in respiratory using technique to gain introduction pattern, statistic, nad mathematics [5].Meanwhile, classification is a process used to gain the function or model that can differ the class of data [6].The developed model will be used as prediction towards unknown class label.Pandey and Pal [7] observed about Bayes Classification algorythm to estimate students performance whether the students will accomplished the study well.Besides, Bharawaj and Pal [8] observed the students performance using 300 participants from 5 university, and different Bachelor of Coumputer Application (BCA).
192| The method used in classification is Bayesian using 17 marks.The results show some factors such as grade on final examination in high school, student's address, the learning process, mother qualification, student's activity, family annual income, and student's family status.Those factors are believed to be influencing the students academic achievement.
According to the previous researchers above, the researchers use the Naïve Bayes algorithm as classification techniques.This algorithm is used as the consideration that Naïve Bayes is one of prediction techniques probability which is depended on the regulation and theory of Bayes.Naïve Bayes algorithm is also known as strong independent asumption on the fiture.It means that the fiture on data do not include towards existance of other fiture with the same data [9].Naïve Bayes algorithm is one of classification algorithm that has high speed computation.It can also solve huge dimension dataset problem [10].However, it has weakness on the corelation between the marks, so that it decreases the performance of Naïve Bayes classification [9][10].
Optimalization using selection fiture is necessary to increase that performance of Naïve Baye such as Backward Elimnation and Format Selection.In this research, the researchers use Backward Elimination to omit the irrelevant marks [5].

METHODS
The method applied in this research used the experiment through this following steps as seen on Figure 1 [9]:

Data Collecting
Data was taken from the students academic section such as courses score and admission section about students former school.Data was taken from 1030 students year of 2010-2012.

Preliminary Data Processing
In this step, the researchers simplified data to define data and to be applied in algorithm or recommended method.These are the following steps: 1.Data integration was saving the media by compiling them.Identifier data and academic data were compiled in one saved medium.2. Data reduction was the taken data with less records and marks which were less necessary.Then, they were decreased the unnecessary mark and record such as unidentified former school data inputed manually.
In data reduction, unnecessary marks were eliminated such as Grade Point Average (GPA), general competence subjects, and supporting competence subjects.Record like unidentified former data which are not inputed manually by students were also eliminated because variable of former school were used as class label in the classification.From preliminary data processing, it showed valid data with 489 records contained 25 marks, consisted of course grade and supporting competence subject grade, and former school as variable of label.

Naïve Bayes Algorithm
Naïve Bayes is an applied theory of Bayes.Naïve Bayes algorithm is based on assumption to simplify marks which are not related among the marks [11].Bayes is also statistic classification used to predict probability in sub-class [5].Bayes has high accurated level and high speed computation when it is applied in dimension data.
Bayes means that each fiture do not have relation.Bayes predictions are based on Bayes theory with standart formula in Formula(1) [9]: The group fiture X = {X1, X2,X3,…,Xq} consist of q mark or q dimensinon.

Backward Elimination
Backward Elimination is defined as eliminator for irrelevant marks [5] e.The output in predictor and regression value Y towards the rest predictor is done until predictor has significant value bigger than Pout.f.If there is no preditor contained value Fpartial<F(1,,out), the model is choosen as the best model.

Experiment and Model Testing
Both experiment and model testing done in this research are as follow: a. Preparing data that will be used in experiment.b.Pre-processing by omitting the unidentified data in coloumn former school.c.Implementing data mining using Rapid Miner Software to develop classification model of Naïve bayes algorythms by Backward Elimination.Rapid Miner is free software for data mining and machine learning.d.Conducting model testing of Naïve bayes algorythms by getting accuracy value in the classification through confusion matrix such as Table 1.Coloumn a and d are the correct classification in which classification will exactly predict in real result.However, coloumn b is wrong classification because model is predicted as no (negative), but the result is yes (positif).Coloumn c is also wrong classification because the prediction is yes (positif), but the result is no {negative) [5].According to the prediction class, the formula from confussion matrix shows the accuracy as follow:

Table 1. Confusion matrix: case of two class models
e. Analysing the result using Naïve bayes algorithm with Backward Elimination selection fiture.

RESULTS AND DISCUSSION
Data taken from pre-processing is tested its validity using corss validation k=10.Before the data is validated using cross validation, the marks will be reducted using selection fiture, backward elimination.Data before tested in backward elimination is shown in Figure 2. The following step is reducting data using backward elimination, resulted reduction data as shown in Figure 3.  Data validated by validity k=10 is tested using naïve bayes with table of confussion matrix shown in Figure 5.

Figure 5. Confussion matrix algorithms of naïve bayes
According to figure 5, the accuracy shown from table is 64.77%.It can be concluded that naïve bayes algorithm is not optimal.So that it is recommended to use the selection fiture.The fiture used for data is backward elimination resulted as Figure 6.

CONCLUSION
Some course grade is influnced by the major in former school or high school or students education background.It is showed by the increasing performance from naïve bayes algorithm testing using backward elimination.The increasing performance shows 6.94%.

Figure 2 .
Figure 2. Students Data before tested by selection fiture

Figure 3 .
Figure 3. Students data after reducted by backward elmination According to figure 2 and 3, it is shown that there are some reduction marks from 25 to 21 marks.It can be conclude that the 4 reducted marks are irrelevant.Those reducted variable are grade of Calculus 1, Probability and Statistics, Theory of Language and Automata, and Artificial Science.The following step is testing the validity of data using cross validity k=10 as shown in Figure 4.

Figure 4 .
Figure 4.The usage of cross validity k=10

Figure 6 .
Figure 6.Confussion matrix of naïve bayes algorithmbackward eliminationAccording to Figure6, it can be seen that backward elimination on naïve bayes algorithm shows 71.71% which means that it shows increasing performance by using optimation 6.94 %.The data results Tabel 2.

Table 2 .
Comparation of testing result