Machine Learning Model Using Extreme Gradient Boosting (XGBoost) Feature Importance and Light Gradient Boosting Machine (LightGBM) to Improve Accurate Prediction of Bankruptcy
Abstract
Abstract. Humans have limitations in processing and analyzing large amounts of data in a short time, including in terms of analyzing bankruptcy data. Bankruptcy data is one of the data that has complex information, so it requires technology that can assist in the process of analyzing and processing data more quickly and efficiently. Data science technology enables data processing and analysis on a large scale, using parallel processing techniques. Parallel processing can be implemented in machine learning models.
Purpose: Using parallel processing techniques, data science technologies enable data processing and analysis at scale. Parallel processing can be implemented in machine learning models. Therefore, this study aims to implement a machine learning model using the Light Gradient Boosting Machine (LightGBM) classification algorithm which is optimized using Extreme Gradient Boosting (XGBoost) Feature Importance to increase the accuracy of bankruptcy prediction.
Methods/Study design/approach: Bankruptcy prediction is carried out by applying LightGBM as a classification model and optimized using the XGBoost algorithm as a Feature Importance technique to improve model accuracy. the dataset used is the Taiwanese Bankruptcy dataset collected from the Taiwan Economic Journal for 1999 to 2009 and has 6,819 data. Taiwanese Bankruptcy is unbalanced data, so this study applies random oversampling.
Result/Findings: The results obtained after going through the model testing process using the confusion matrix obtained an accuracy of the performance of LightGBM+XGBoost Feature Importance of 99.227%.
Novelty/Originality/Value: So it can be concluded that the implementation of XGBoost Feature Importance can be used to improve LightGBM's performance in bankruptcy prediction.
References
[2] R. H. Hariri, E. M. Fredericks, and K. M. Bowers, “Uncertainty in big data analytics: survey, opportunities, and challenges,” J. Big Data, vol. 6, no. 1, p. 44, Dec. 2019, doi: 10.1186/s40537-019-0206-3.
[3] B. P. Bhattarai et al., “Big data analytics in smart grids: state‐of‐the‐art, challenges, opportunities, and future directions,” IET Smart Grid, vol. 2, no. 2, pp. 141–154, Jun. 2019, doi: 10.1049/iet-stg.2018.0261.
[4] Z.-H. Zho, “Machine learning,” in Machine learning, Springer Nature, 2021, p. 453.
[5] J. Yan et al., “LightGBM: accelerated genomically designed crop breeding through ensemble learning,” Genome Biol., vol. 22, no. 1, p. 271, Dec. 2021, doi: 10.1186/s13059-021-02492-y.
[6] A. Prabha, J. Yadav, A. Rani, and V. Singh, “Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier,” Comput. Biol. Med., vol. 136, p. 104664, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104664.
[7] M. A. Muslim, Y. Dasril, A. Alamsyah, and T. Mustaqim, “Bank predictions for prospective long-term deposit investors using machine learning LightGBM and SMOTE,” J. Phys. Conf. Ser., vol. 1918, no. 4, p. 042143, Jun. 2021, doi: 10.1088/1742-6596/1918/4/042143.
[8] S. Diantika, “Penerapan Teknik Random Oversampling untuk Mengatasi Imbalance Class Dalam Klasifikasi Website Phising,” J. Mhs. Tek. Inform., vol. 7, no. 1, pp. 19–25, 2023, doi: https://doi.org/10.36040/jati.v7i1.6006.
[9] H. L. Quang Tien, L. Quang Tran, and T. Hop Do, “An Empirical Study on Bankruptcy Prediction using Ensemble Learning,” in 2022 RIVF International Conference on Computing and Communication Technologies (RIVF), Dec. 2022, pp. 173–178, doi: 10.1109/RIVF55975.2022.10013848.
[10] S. Ben Jabeur, N. Stef, and P. Carmona, “Bankruptcy Prediction using the XGBoost Algorithm and Variable Importance Feature Engineering,” Comput. Econ., vol. 61, pp. 715–741, Jan. 2022, doi: 10.1007/s10614-021-10227-1.
[11] V. B. Gladshiya and K. Sharmila, “An Efficient Approach of Feature Selection and Metrics for Analyzing the Risk of the Students Using Machine Learning,” in 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Oct. 2021, pp. 1–6, doi: 10.1109/ICAECA52838.2021.9675507.
[12] S. Saud, B. Jamil, Y. Upadhyay, and K. Irshad, “Performance improvement of empirical models for estimation of global solar radiation in India: A k-fold cross-validation approach,” Sustain. Energy Technol. Assessments, vol. 40, p. 100768, Aug. 2020, doi: 10.1016/j.seta.2020.100768.
[13] W. Liang, S. Luo, G. Zhao, and H. Wu, “Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms,” Mathematics, vol. 8, no. 5, p. 765, May 2020, doi: 10.3390/math8050765.
[14] B. Wang et al., “Research on anomaly detection and real-time reliability evaluation with the log of cloud platform,” Alexandria Eng. J., vol. 61, no. 9, pp. 7183–7193, Sep. 2022, doi: 10.1016/j.aej.2021.12.061.
[15] X. Shi, Y. D. Wong, M. Z. F. Li, C. Palanisamy, and C. Chai, “A feature learning approach based on XGBoost for driving assessment and risk prediction,” Accid. Anal. Prev., vol. 129, no. March, pp. 170–179, 2019, doi: 10.1016/j.aap.2019.05.005.