A Robust Data Envelopment Analysis for Evaluating Technical Efficiency of Indonesian High Schools

The main purpose of this study is to evaluate the technical efficiency of high school education in Indonesia by applying Data Envelopment Analysis (DEA), which is the most frequently used to measure the efficiency scores. However, this study uses a robust approach to face the complex problem of the traditional DEA, which may lead to biased results. Besides, it is a powerful approach to estimate technical efficiency when outliers contaminate the data set. Statistical data from general senior secondary schools in the period 2015/2016 is analyzed, using 34 provinces as decision-making units (DMUs), with eight input and six output variables. The results indicate that the average efficiency score of Indonesia's major political subdivisions in managing high school education is 0.936. Furthermore, as many as 32.35 percents of provinces achieve efficient performances, with an efficiency score equal to one, while 17 provinces have above average efficiency scores. The results also indicate that efficiency scores from robust data envelopment analysis provide better accuracy. Overall, application of robust data envelopment analysis (RDEA) is appropriate for measuring the efficiency of provincial performance in organizing secondary education.


INTRODUCTION
High schools are very important educational institutions because they prepare students to face the real world by providing them with the necessary skills and knowledge to be able to live independently and co-exist in the community in a proper manner. Completing the high school level of education may help students in the next stage of their lives, whether they decide to go to college or take the first steps in their careers. Moreover, it is difficult to find a decent job without a high school diploma, as educational attainment is usually considered to be an absolute requirement in securing a job. In Indonesia, education level is an important consideration in certain positions that require appropriate skills. Therefore, graduation from high school is generally regarded as a minimum requirement for further education or for direct entry into work.
Therefore, analysis of the measurement of the efficiency of high school education is important. This study aims to measure the efficiency of Indonesia's major political subdivisions in orga-nizing high school education by using the DEA method. However, a robust approach is used to face the possibility of the existence of outliers in the actual data set. This approach tends to provide results with better accuracy. By knowing the efficiency of each province in managing high school education, the Indonesian government is consequently able to determine the best role model of high school management in order to improve the efficiency level of other provinces. This study is expected to make a contribution to Indonesian education, especially high school education, and enhance the quality of high school graduates who are ready to meet further challenges, whether these be college life or going straight to work.
Unfortunately, research on high school efficiency measurement in Indonesia is limited. In fact, no research has been found regarding the same topics. Fatimah & Mahmudah (2017) investigate the measurement of the efficiency of elementary schools in Indonesia by using two-stage data envelopment analysis. However, other studies use high schools in Indonesia as samples for evaluation analysis. Yusrizal et al. (2017) investigate the level of knowledge and understanding of physics teachers in Senior High Schools in Banda Aceh when developing and analyzing test items. By using as many as 32 physics teachers, analysis of the results indicates that their skills are not satisfactory.
Measurement efficiency was introduced by Farrel (1957), and one of the most commonly used methods is DEA, which is a non-parametric method for performing frontier analysis to estimate the efficiency scores of DMUs. This method allows us to make comparisons between DMUs in order to establish which are performing efficiently. An efficient DMU has an efficiency score exactly equal to one, which is equivalent to an efficiency value of 100 percent. Otherwise, DMU is said to be inefficient.
Many studies have applied DEA methods to evaluate the efficiency of educational institutions (see, amongst others: Carrington et al., 2005;Kong & Fu, 2012;Nazarko & Saparauskas, 2014;and Mikusova, 2015). Barrow (1991) applied stochastic frontier analysis for estimating the stochastic cost frontier of schools in England, while Bonesrqnning & Rattsq (1994) analyze the efficiency of high schools in Norway. Moreover, the technical efficiency of school districts in South Carolina is studied by Cooper & Cohn (1997). Although the DEA method has notable strengths in the analysis of frontier production, its estimator has complex and multidimensional properties.
Therefore, the existence of outliers causes the traditional DEA method to be sensitive due to it relies on the best DMU. It is important to note that the presence of outliers may produce less accurate results of the analysis. In order to deal with the problem that appears in the traditional DEA, Cooper et al. (1998) and Gstach (1998) use stochastic DEA but this approach usually needs classical assumptions of statistical distribution. Further, Wilson (1995) suggests an approach to detect outliers regarding DEA methods. Other studies which used this procedure are those of Charnes et al. (1992) & Zhu (1996. Furthermore, Bertsimas & Sim (2003) analyze DEA method using robust optimization. However, this study applies robust approach for estimating bias-corrected scores of technical efficiency, which is introduced by Simar & Wilson (1998). The statistical data of Indonesian high schools from 2015/2016 is analyzed using an R program. Due to the importance of science and technology, which influences very many areas, this study focuses on the field of scientific study in high schools in Indonesia. As mentioned by Dwianto et al. (2017), students in Indonesia are left behind regarding science accomplishment and there are some weaknesses in the science learning process.

METHODS
DEA is a non-parametric method used to measure DMUs by comparing those that have similar characteristics or are homogeneous, based on several input variables, to produce a number of outputs. This method has good advantages compared to other measurement methods because there is no need to make distribution assumptions, which are required in a parametric analysis. DEA method consists of two models, i.e. the CRS (constant return to scale) and the VRS (variable return to scale) models. CRS model was introduced by Charnes, et al. (1978), that is why this model is often called the CCR model whereas the second model was developed by Banker, et al. (1984) and also known as the BCC model. However, the second model is a development of the first model. The difference between these two models relies on the initial assumption where the first model uses similarity ratios between the increasing input and output variables. Further, the CRS model also presumes that most DMUs perform at an optimal scale. Meanwhile, the VRS model does not use similarity ratios but the increases in input and output variables are different. Further, the VRS model presumes that DMUs do not perform at an optimal scale. Basically, the DEA method which is concerning naïve score can be explained as follows. Let the observed input variables are defined by , where while the output variables are defined by , where Suppose that . Therefore, the input set have input variables to produce output variables under P and condition is applied (Shephard, 1981) and Coelli, et al., 1994). Therefore, following Besstremyannaya, et al., (2015) the CRS model of DEA method where input-oriented is applied for where can be written as follows: subject to and The assumption used in this model show that has strict convexity as well as disposability of the input and output variables. Strong disposability refers to when and then . Further, the additional constraints of are needed in order to impose this model on the VRS model.
As this method is based on frontiers, then to produce better results it requires the accuracy and preciseness of the input and output variables, because even the slightest change can change the estimates significantly.
Even though DEA method is very often used and is the most powerful method, precise and accurate data is required to yield unbiased scores of efficiency. However, most researchers face difficulties in obtaining real data accurately because the input and output variables are full of uncertainties. Therefore, in order to deal with the uncertainty this study applies bootstrap method which is a very good method for approximating the estimator where empirical distribution is concerned. Bootstrap is used to correct for bias, as the estimated boundary of the input variables may fail to include the most efficient DMU. Consequently, for each DMU j then bias can be explained by bias where it can be written as bias Therefore, the following steps are required in fulfilling this: Step 1: Estimating the naive scores of DEA in equation (2) that is defined by Step 2: Repeating B times to provide bootstrap estimates as many as J sets.
Step 3: Calculating for Step 4: Calculating bias-corrected efficiency scores by using . Simar and Wilson (2007) report that the input variables in bootstrap DEA where the input-oriented model is concerned in correcting the bias scores , which is the reciprocal of do not depend on the environmental variables . In other words, the input variables which are not controlled by producers can be explained by the following procedures: Step 1: Estimating the naïve distance scores which are defined as , where .
Step 2: Assuming that the naive distance scores , where with left truncation at .
Step 3: Calculating and with condition .
Step 4: Repeating B times to provide bootstrap estimates as many as J sets.
Step 6: Calculating bias-corrected scores by using .

RESULTS AND DISCUSSION
The study uses statistics from high schools from 2015/2016 that were prepared by the Center for Educational and Cultural Data and Statistics, Secretariat General, Ministry of Education and Culture of Republic of Indonesia. Basically, the data presents a general description of high schools in Indonesia that covers the number of schools, applicants, new entrants, students, repeaters, graduates, headmasters, teachers, classes and classrooms. Furthermore, the data is based on the results of the verification and validation that are made by the Secretariat Directorate General of Primary and Secondary Education and eight input variables (I). Furthermore, the study is based on high schools from a Science point of view, so the output variables are the average of national exams in the Science field of study. Table 1 gives a general description for all the variables used in this study.
the Directorate of Senior Secondary Schools through basic educational data.
The calculation of efficiency scores using the DEA method is made for all provinces in Indonesia. 34 provinces are used as DMUs, which are analyzed by using six output variables (O) and average of the Indonesian language (O1); that of the English language (O2); the average of mathematics (O3); of physics (O4), of chemistry (O5); and of biology (O6). Table 2 shows the cumulative distribution of the efficiency scores for traditional DEA. The analysis results indicate that the average of the efficiency scores of Indonesia's major political subdivisions in managing high school education is 0.936, with a standard deviation of 0.065. Figure 1 indicates the traditional DEA efficiency scores for all provinces in Indonesia.

Figure 1. Efficiency Scores
Jakarta requires 96.9 percent of the input variables to be able to carry out its activities.
As previously mentioned, traditional DEA tends to provide biased efficiency scores. Besides, the actual data can be contaminated by outliers, which all the input variables contain. This study applies a robust approach of bias-corrected technical efficiency to DEA scores. Figure 2, 3 and 4 show a comparison of the efficiency scores for the 34 provinces in Indonesia using traditional DEA and robust DEA, in which the number of bootstrap replications B = 100, 500 and 1000. Furthermore, this study uses the size of confidence intervals for the bias-corrected DEA scores alpha = 0.01, 0.02 and 0.05. 32.35 percent of provinces show an efficient performance, with an efficiency score equal to one. Further, 17 provinces (50 percent) have efficiency scores above the average score. The lowest efficiency score is 0.785, corresponding to West Nusa Tenggara.
Based on discrimination stages introduced by Thanassoulis et al. (1987) then it is save report that the province of South Kalimantan should be able to sustain its activities to produce an optimum output using only 99.9 percent of the available inputs, whereas the province of North Sumatra should be able to carry out its activities with optimum results using 98.5 percent of the existing resources. The Special Capital Region of Figure 2, 3 and 4 indicate that the efficiency scores of RDEA deliver consistent results, whose values always follow the technical efficiency scores of traditional DEA. The average of bias for naive DEA scores for all degrees of confidence level (alpha) is classified as small; they have the same bias efficiency score, i.e. 0.04. Descriptive statistics of the RDEA efficiency scores are as follows. The average efficiency scores with replication B=100 and alpha=0.01, 0.02 and 0.05 are (0.898, 0.899, 0.900), while the average efficien-cy score with replication B=500 and alpha=0.01, 0.02 and 0.05 are (0.900, 0.900, 0.899). The average of efficiency scores with replication B=1000 and alpha=0.01, 0.02 and 0.05 are (0.899, 0.899, 0.900). Further, the standard deviation equals 0.05 for all the replications, as well as for the size of confidence interval. Moreover, the results also indicate that RDEA efficiency scores with number of replications B= 100, 500 and 1000 and alpha = 0.01, 0.02 and 0.05 are within the confidence intervals.

CONCLUSION
This study used robust data envelopment analysis (RDEA) to investigate efficiency measurement when the actual data are contaminated by outliers, as the efficiency scores of traditional DEA are susceptible to bias. 34 provinces were used as DMUs, whose efficiencies were measured by using eight input variables and six output variables to evaluate technical efficiency in managing high school education in Indonesia.
The results show that traditional DEA gives 11 provinces efficiency scores equal to one, which indicates efficient performance. West Nusa Tenggara has the lowest efficiency score (0.785), while fifty percent, or 17 provinces, have efficiency scores above average. The study uses the number of bootstrap replications B = 100, 500 and 1000, while the size of confidence interval = 0.01, 0.02 and 0.05. The results of RDEA reveal that its efficiency scores always follow the technical efficiency scores of traditional DEA, with its scores within the confidence intervals. Further, the bias efficiency score for all confidence levels is classified as small (0.04). Overall, RDEA is appropriate for application to measure the efficiency of provincial governments in organizing high school education in Indonesia.