Variable Selection on Sensus Pertanian 2013 to Determine Relevant Variable on Agricultural GRDP 2013 using Partial Least Square Regression
Partial least square regression (PLSR) is a method to overcome the multicollinearity in data. This method also can be applied data with high dimensionality problem, i.e. where the number of variables is much larger than the number of observations. PLSR usually is used in chemometrics, climatology, ecology, biology, and medical research. This paper will apply PLSR in the agriculture data. Gross Regional Domestic Product by agricultural industry (GRDP in agriculture sector in 2013) is used as dependent variable and 590 variables from Sensus Pertanian 2013 are used as independent variables. This paper uses the provinces of Indonesia as the observations. Variable selection is used to select the relevant variables which have influence towards the response; the first by applying Variable Importance in the Projection (VIP) on original data (PLSR-VIPa) and standardized data (PLSR-VIPz), and the other by using Least Absolute Shrinkage and Selection Operator (LASSO) to select the variables before PLSR is applied to data (L-PLSR). The analysis shows that PLSR-VIPa can explain the variance in the data better than the other models with the number of independent variables are less used in the model.