Modified Mixed Effects Random Forest in Small Area Estimation Using PCA and Rotation Forest with Correlated Auxiliary Variables
DOI:
https://doi.org/10.15294/sji.v11i3.10633Keywords:
Tree-based method, Generalized linear mixed models, Multicollinearity, Poverty, Statistics Indonesia, Jambi ProvinceAbstract
Purpose: The per capita expenditure data in Jambi Province, Indonesia have been plagued with severe multicollinearity problems. To address the issue, this study developed an effective small area estimation (SAE) method, which is essential for formulating comprehensive regional development policies in Jambi Province. By modifying the mixed effects random forest (MERF) method, we introduced PCA-MERF (which applies principal component analysis prior to MERF) and MERoF (which replaces the standard random forest with rotation forest) to handle multicollinearity more effectively. Data from the National Socioeconomic Survey (Susenas) in March 2021 and Village Potential (PODES) in 2021 were utilized. The methods were evaluated using metrics such as root mean square error (RMSE), relative root mean square error (RRMSE), coefficient of variation (CV), and their ability to capture random area effects. The random effect block (REB) bootstrap approach was employed to obtain MSE estimates for evaluating area-level estimate quality.
Result: The results showed that MERoF outperformed both MERF and PCA-MERF, particularly in unit-level (village) estimation. Additionally, MERoF demonstrated superior capability in capturing variation between subdistricts compared to MERF and PCA-MERF. PCA-MERF performed better than MERF and MERoF at the area level (subdistrict). All three methods showed acceptable performance with RRMSE and CV values ranging between 8% and 10%, indicating precise and reliable predictions for per capita expenditure in small areas. These modifications to MERF prove effective and advantageous for small-area estimation in datasets with significant multicollinearity.
Novelty: This research introduces a novel semi-parametric, tree-based SAE approach, enhancing the precision of per capita expenditure estimates and supporting more informative regional policy decisions, thus filling a gap in current SAE methodologies.