Classification Modeling with RNN-based, Random Forest, and XGBoost for Imbalanced Data: A Case of Early Crash Detection in ASEAN-5 Stock Markets
DOI:
https://doi.org/10.15294/sji.v11i3.4067Keywords:
Early crash detection, GRU, LSTM, RNN, Random forest, XGBoostAbstract
Purpose: This research aims to evaluate the performance of several Recurrent Neural Network (RNN) architectures, including Simple RNN, Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM), compared to classic algorithms such as Random Forest and XGBoost, in building classification models for early crash detection in the ASEAN-5 stock markets.
Methods: The study examines imbalanced data, which is expected due to the rarity of market crashes. It analyzes daily data from 2010 to 2023 across the major stock markets of the ASEAN-5 countries: Indonesia, Malaysia, Singapore, Thailand, and the Philippines. A market crash is the target variable when the primary stock price indices fall below the Value at Risk (VaR) thresholds of 5%, 2.5%, and 1%. Predictors include technical indicators from major local and global markets and commodity markets. The study incorporates 213 predictors with their respective lags (5, 10, 15, 22, 50, 200) and uses a time step of 7, expanding the total number of predictors to 1,491. The challenge of data imbalance is addressed with SMOTE-ENN. Model performance is evaluated using the false alarm rate, hit rate, balanced accuracy, and the precision-recall curve (PRC) score.
Result: The results indicate that all RNN-based architectures outperform Random Forest and XGBoost. Among the various RNN architectures, Simple RNN is the most superior, primarily due to its simple data characteristics and focus on short-term information.
Novelty: This study enhances and extends the range of phenomena observed in previous studies by incorporating variables such as different geographical zones and periods and methodological adjustments.