Modified Mixed Effects Random Forest in Small Area Estimation Using PCA and Rotation Forest with Correlated Auxiliary Variables

Authors

  • Rizki Ananda IPB University Author
  • Khairil Anwar Notodiputro IPB University Author
  • Muhammad Nur Aidi IPB University Author

DOI:

https://doi.org/10.15294/sji.v11i3.10633

Keywords:

Tree-based method, Generalized linear mixed models, Multicollinearity, Poverty, Statistics Indonesia, Jambi Province

Abstract

Purpose: The per capita expenditure data in Jambi Province, Indonesia have been plagued with severe multicollinearity problems. To address the issue, this study developed an effective small area estimation (SAE) method, which is essential for formulating comprehensive regional development policies in Jambi Province. By modifying the mixed effects random forest (MERF) method, we introduced PCA-MERF (which applies principal component analysis prior to MERF) and MERoF (which replaces the standard random forest with rotation forest) to handle multicollinearity more effectively. Data from the National Socioeconomic Survey (Susenas) in March 2021 and Village Potential (PODES) in 2021 were utilized. The methods were evaluated using metrics such as root mean square error (RMSE), relative root mean square error (RRMSE), coefficient of variation (CV), and their ability to capture random area effects. The random effect block (REB) bootstrap approach was employed to obtain MSE estimates for evaluating area-level estimate quality.

Result: The results showed that MERoF outperformed both MERF and PCA-MERF, particularly in unit-level (village) estimation. Additionally, MERoF demonstrated superior capability in capturing variation between subdistricts compared to MERF and PCA-MERF. PCA-MERF performed better than MERF and MERoF at the area level (subdistrict). All three methods showed acceptable performance with RRMSE and CV values ranging between 8% and 10%, indicating precise and reliable predictions for per capita expenditure in small areas. These modifications to MERF prove effective and advantageous for small-area estimation in datasets with significant multicollinearity.

Novelty: This research introduces a novel semi-parametric, tree-based SAE approach, enhancing the precision of per capita expenditure estimates and supporting more informative regional policy decisions, thus filling a gap in current SAE methodologies.

Author Biographies

  • Rizki Ananda, IPB University

    Rizki Ananda, SST

  • Khairil Anwar Notodiputro, IPB University

    Prof. Dr. Ir. Khairil Anwar Notodiputro, MS

  • Muhammad Nur Aidi, IPB University

    Prof. Dr. Ir. Muhammad Nur Aidi, MS

Downloads

Article ID

10633

Published

30-08-2024

Issue

Section

Articles

How to Cite

Modified Mixed Effects Random Forest in Small Area Estimation Using PCA and Rotation Forest with Correlated Auxiliary Variables. (2024). Scientific Journal of Informatics, 11(3), 705-720. https://doi.org/10.15294/sji.v11i3.10633