A Minimum Error-Based PCA for Improving Classifier Performance in Detecting Financial Fraud

Bayu Nur Pambudi(1), Silmi Fauziati(2), Indriana Hidayah(3),


(1) Financial Transaction Reports and Analysis Center
(2) Department of Electrical and Information Engineering, Universitas Gadjah Mada
(3) Department of Electrical and Information Engineering, Universitas Gadjah Mada

Abstract

The main challenge of data mining approaches to detect fraud in financial transaction data is the imbalance of data classes in available datasets, with a much smaller fraud class proportion than the non-fraud. This imbalance affects the f1-score to be low due to unbalanced precision and recall. Therefore, the model can predict one class well, but it does not apply to another class. In addition, the lengthy training time duration and high computational resource requirements in implementing data mining also make them a particular concern. Therefore, solely handling imbalanced data is still insufficient to produce the expected performance. Reduction of data dimensions can be a solution to increase the speed of the process. However, this method actually reduces the classifier’s performance when it comes to classification. Furthermore, this study intends to improve the performance of the data mining approach based on the Support Vector Machine (SVM) classifier aiming at detecting financial fraud transactions. The SVM performance was refined by tuning the kernel and hyperparameter integrated with the Random Under Sampling (RUS) and our Minimum error-based Principal Component Analysis (MebPCA). The RUS was used to handle imbalanced data, while MebPCA modified data dimension reduction techniques based on classification errors to speed up computational time without disturbing the performance of SVM. This combination improves the classifier's performance in detecting fraud effectively with a precision improvement of 29.31% and f1-score of 19.8%, and efficiently reduces the duration of training time significantly by 36.39% compared to previous research regarding the SVM method for fraud detection.

Keywords

data mining; financial fraud detection; MebPCA; RUS; SVM

Full Text:

PDF

References

N. S. Alfaiz and S. M. Fati, “Enhanced Credit Card Fraud Detection Model Using Machine Learning,” Electronics, vol. 11, no. 662, pp. 1–16, 2022, doi: https://doi.org/10.3390/electronics11040662.

W. Hilal, S. A. Gadsden, and J. Yawney, “Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances,” Expert Syst. Appl., vol. 193, p. 116429, 2022, doi: 10.1016/j.eswa.2021.116429.

S. Stefanov, D. Georgieva, and J. Vasilev, “Issues in the Disclosure of Financial Information by Multinational Enterprises,” TEM J., vol. 11, no. 1, pp. 5–12, 2022, doi: 10.18421/TEM111-01.

T. Le, “A comprehensive survey of imbalanced learning methods for bankruptcy prediction,” IET Commun., vol. 16, no. 5, pp. 433–441, 2022, doi: 10.1049/cmu2.12268.

A. Oza, “Fraud Detection using Machine Learning,” Stanford Univ. CS229 Proj. Publ., vol. 261, pp. 1–6, 2018.

R. Domingues, M. Filippone, P. Michiardi, and J. Zouaoui, “A Comparative Evaluation of Outlier Detection Algorithms: Experiments and Analyses,” Pattern Recognit. J., vol. 74, pp. 406–421, 2018, doi: https://doi.org/10.1016/j.patcog.2017.09.037.

A. O. Adewumi and A. A. Akinyelu, “A Survey of Machine Learning and Nature-Inspired Based Credit Card Fraud Detection Techniques,” Int J Syst Assur Eng Manag 8, vol. 8, pp. 937–953, 2017, doi: https://doi.org/10.1007/s13198-016-0551-y.

E. A. Lopez Rojas, S. Axelsson, and D. Baca, “Analysis of Fraud Controls using the PaySim Financial Simulator,” Int. J. Simul. Process Model., vol. 13, no. 4, pp. 377–386, 2018, doi: 10.1504/ijspm.2018.10014984.

E. A. Lopez-Rojas and C. Barneaud, “Advantages of the PaySim Simulator for Improving Financial Fraud Controls,” Springer Nat. Switz. AG 2019, vol. 998, pp. 727–736, 2019, doi: 10.1007/978-3-030-22868-2_51.

E. A. Lopez-Rojas, A. Elmir, and S. Axelsson, “PaySim: A Financial Mobile Money Simulator for Fraud Detection,” Eur. Model. Simul. Symp., no. c, pp. 249–255, 2016.

B. N. Pambudi, I. Hidayah, and S. Fauziati, “Improving Money Laundering Detection Using Optimized Support Vector Machine,” 2019 Int. Semin. Res. Inf. Technol. Intell. Syst., pp. 273–278, 2019, doi: 10.1109/ISRITI48646.2019.9034655.

R. Pech, “Fraud Detection in Mobile Money Transfer as Binary Classification Problem,” Eagle Tech. Inc Publ., pp. 1–15, 2019.

H. Ubaya and R. S. Juairiah, “Performance of RUS and SMOTE Method on Twitter Spam Data Using Random Forest,” J. Phys. Conf. Ser., vol. 1500, no. 1, pp. 1–8, 2020, doi: 10.1088/1742-6596/1500/1/012130.

G. Pang, C. Shen, L. Cao, and A. Van Den Hengel, “Deep Learning for Anomaly Detection: A Review,” ACM Comput. Surv., vol. 54, no. 2, pp. 1–38, 2022, doi: https://doi.org/10.1145/3439950.

Z. Fan et al., “Modified Principal Component Analysis: An Integration of Multiple Similarity Subspace Models,” IEEE Trans. Neural Networks Learn. Syst., vol. 25, no. 8, pp. 1538–1552, 2014.

D. J. J. Farnell, H. Popat, and S. Richmond, “Multilevel Principal Component Analysis (mPCA) in Shape Analysis: A Feasibility Study in Medical and Dental Imaging,” Comput. Methods Programs Biomed., 2016, doi: 10.1016/j.cmpb.2016.01.005.

S. Guo, P. Rösch, J. Popp, and T. Bocklitz, “Modified PCA and PLS: Towards a Better Classification in Raman Spectroscopy – based Biological Applications,” J. Wiley Chemom., no. October 2019, pp. 1–10, 2020, doi: 10.1002/cem.3202.

A. Salehi, M. Ghazanfari, and M. Fathian, “Data Mining Techniques for Anti Money Laundering,” Int. J. Appl. Eng. Res., vol. 12, no. 20, pp. 10084–10094, 2017.

A. Rojas-Domínguez, L. C. Padierna, M. J. Carpio Valadez, H. J. Puga-soberanes, and H. J. Fraire, “Optimal Hyper-Parameter Tuning of SVM Classifiers With Application to Medical Diagnosis,” IEEE Open Access J., vol. 6, no. March 9, 2018, pp. 7164–7176, 2018, doi: 10.1109/ACCESS.2017.2779794.

M. Riera, J. M. Arnau, and A. González, “DNN Pruning with Principal Component Analysis and Connection Importance Estimation,” J. Syst. Archit., vol. 122, p. 102336, 2022, doi: 10.1016/j.sysarc.2021.102336.

C. He, J. Li, W. Liu, and J. Peng, “A Low-Complexity Quantum Principal Component Analysis Algorithm,” Quantum Comput., vol. 3, pp. 1–13, 2022, doi: 10.1109/TQE.2021.3140152.

N. Bhargava, A. Kumar, D. Kumar, and Meenakshi, “A Modified Concept of PCA to Reduce the Classification Error using Kernel SVM Classifier,” Int. J. Sci. Eng. Res., vol. 6, no. 6, pp. 1509–1513, 2015.

T. Saito and M. Rehmsmeier, “The Precision-Recall Plot is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets,” PloS one. 10. e0118432, pp. 1–21, 2015, doi: 10.1371/journal.pone.0118432.

M. B. Abidine, B. Fergani, and F. J. Ordóñez, “Effect of Over-sampling Versus Under-sampling for SVM and LDA Classifiers for Activity Recognition,” Int. J. Des. Nat. Ecodynamics, vol. 11, no. 3, pp. 306–316, 2016, doi: 10.2495/DNE-V11-N3-306-316.

Refbacks

  • There are currently no refbacks.