Multilayer Perceptron Optimization on Imbalanced Data Using SVM-SMOTE and One-Hot Encoding for Credit Card Default Prediction

ABSTRACT


Introduction
The development of technology has provided many benefits in various fields of life, including in the economic field. One of the development products in the economic sector is non-cash payments using credit cards. A credit card is a card issued to users as a means of payment for the purchase of goods and services (Raj and Portia, 2011). Credit cards are one of the most popular payment methods. In addition to having a positive impact, the growing number of credit card users also creates problems for banks as issuers. One of the problems that often occurs is credit risk, namely events when there is a failure to fulfill obligations by credit card users to banks (Pertiwi et al., 2020). This credit risk occurs because banks use credit lending standards that are too loose without analyzing the credibility of potential users (Yüksel, 2018). Chen et al. explained that risk management on credit risk is important for banks to pay attention to in order to reduce the risk of losing loan money due to credit card users who default (failed to pay) (Chen at al., 2020). To overcome this problem, the bank can conduct a credit assessment by creating a model that can be used for the process of classifying prospective customers into certain classes (Louzada et al., 2016).
There are many algorithms that can be used for the modeling process for classification. One of the algorithms that can be used is artificial neural networks (ANN) which tend to produce better performance on large data (Osisanwo et al., 2017). One type of ANN that is commonly used is multilayer perceptron (MLP), as in the research conducted by Yildirim (2017), Faris et al. (2016), andNeagoe et al. (2018) who used MLP in case classification.
There are several previous studies related to classification cases using MLP which is applied to credit card default prediction. The research was conducted by Pasha et al. (2017) and Koklu (2016) who tested the Taiwan default of credit card client dataset using multilayer perceptron and produced a fairly high level of accuracy of 81.7% and 81.049%. Despite obtaining a high level of accuracy, this study did not use a calculation metric that took into account the imbalanced data.
Binary classification cases that use real-world datasets are prone to imbalanced data which causes the accuracy model to be biased and cannot represent its real performance to distinguish the two classes. It happens because the trained model by datasets that have an unbalanced class distribution causes the algorithm to only focus on the majority class and ignore the minority class (Alam et al., 2020). Another study (Vishwakarma et al., 2021) that used MLP for classification on a similar dataset as Pasha et al. and Koklu, produced an AUC score of 0.5. AUC score is a method of calculating the level of a model's performance on imbalanced datasets (Vo et al., 2021). That result indicates that the model cannot differentiate the two classes properly.
The results of high accuracy but have a low AUC score indicate that the model is experiencing bias-to-majority when the used dataset is imbalanced and handled by the classification algorithm without any proper optimization phases . To cope with these problems, random sampling using synthetic minority oversampling technique (SMOTE) can be used to deal with imbalanced data (Suksut et al., 2019). The SMOTE technique has several development variants, one of them is SVM-SMOTE (Zhang et al., 2019). In addition to the oversampling, the encoding on the dataset also needs to be considered. This is because the encoding in the original dataset uses ordinal encoding on all categories, both on categorical ordinal and nominal. The use of ordinal encoding on features with categorical nominal values can lead to sorting bias. To avoid these problems, one-hot encoding (OHE) can be used to improve the ability of the classification algorithm (Potdar, 2017). Based on the problem mentioned above, this research focuses on optimizing the multilayer perceptron which is carried out by the optimization process using SVM-SMOTE and OHE to overcome the imbalanced data.

Method
This stage describes the analysis of the methods carried out in this study, specifically by applying OHE for data transformation in the categorical nominal class, min-max scaling in the numeric and categorical ordinal class, and SVM-SMOTE to perform oversampling on minority class instances. These two techniques are used to optimize the multilayer perceptron to classify credit card users who fail to pay. The flowchart of the proposed method in this study is shown in Figure 1.

Dataset
This study uses a public dataset as an object for the research experiment. The dataset was obtained from the UCI machine learning repository with the title of "the default of credit card clients" dataset from a bank in Taiwan. This data was produced by I-Cheng Yeh from the Department of Information Management, Chung Hua University Taiwan. This dataset has 30000 individual data, 23 features, and one class feature. https://journal.unnes.ac.id/sju/index.php/jaist jaist@mail.unnes.ac.id

Preprocessing
The preprocessing stage includes two processes, these are data cleaning and data splitting processes. Data cleaning is the process of cleaning data from errors which can be in the form of missing entries and incorrect entries. Some of the causes of errors are the errors that occur due input, large amounts of data entered manually, and some records that are too expensive to obtain (Aggarwal, 2015). The next process is data splitting. For large amounts of data, several sample sets can be made for evaluation, namely training and testing datasets (Kuhn and Johnson, 2013). The training dataset is a sample that is used to create a model, while the testing or validation dataset is a term for a sample used for the process of evaluating the performance of a model. In this study the dataset is splitted into two parts with proportions as shown in Table 1.

Encoding and Normalization
The data encoding and normalization stages are carried out by changing the data process on the dataset into a numeric form that makes it easier for the MLP algorithm in the learning phase to create a model. The dataset used in this study has 23 features and one class feature whose values are numeric and categorical as shown in Table 2.  (Potdar, 2017), to cope with this problem, some of categorical features that have unordered values will be encoded using OHE. Meanwhile, for other features, the encoding uses ordinal encoding according to the original conditions in the dataset and then the data normalization process will be carried out using min-max scaling. OHE works by transforming a single feature of n observers and d different values into d binary variables with n observers. OHE will change the values to 0 and 1, then on feature x with = { , , } with 1 = , 2 = , 3 = then this technique will change feature x to (1, 0, 0), (0, 1, 0), (0, 0, 1). This process is shown in Table 3 and Table 4. In this study, the features changed by OHE are features X2, X3, and X4.  Min-max scaling is a data normalization technique that transforms raw data by rescaling the original data into a new data with a certain fixed range. This technique generally uses a range of 0 to 1 [21]. In this study, features with numerical values and categorical ordinal will be normalized using this technique.

SVM-Synthetic minority over-sampling technique
Synthetic minority over-sampling technique (SMOTE) is an oversampling technique that is used to deal with imbalanced dataset problems by modifying the training dataset to produce a balanced training dataset for each class. One of the developments of SMOTE was carried out by Nguyen, Cooper, and Kamei by developing the SVM-SMOTE technique (Luo et al., 2019). The SVM-SMOTE technique utilizes interpolation, extrapolation, and SVM techniques to create new synthetic instances. The steps of this process are as follows (Nguyen at al., 2011).

Multilayer Perceptron
Multilayer perceptron (MLP) is a neural network consisting of an input layer, an output layer, and one or more hidden layers that uses supervised learning paradigms. MLP is famous for its implementation of solving problems in a wide area and is considered as one of the most versatile architectures (Silva et al., 2017). An illustration of MLP is shown in Figure 2. MLP was used in this study to create a predictive model for classification. MLP has several hyperparameters that need to be configured to get the best possible model. Table 5 is the configuration arrangement of the hyperparameters that will be used.

Model Evaluation
The evaluation stage is a phase for the model that has been produced to be evaluated using a testing dataset that has never been seen by the model. The evaluation will be projected on a confusion matrix which will then be used to calculate the AUC score. The evaluation of the model will use the AUC score metric rather than using the accuracy score. The usage of accuracy score as a measurement method is not reliable for imbalanced datasets (Korkmaz, 2020) because it is prone to experiencing the accuracy paradox (Artetxe, 2020). AUC score can be calculated using Equation 1 (Devi et al., 2017).
Where TP is the positive proportion in the dataset classified as positive, TN is the negative proportion in the dataset classified as negative, FP is the negative proportion in the dataset classified as positive, and FN is the positive proportion in the dataset classified as negative.

Results and Discussion
The results and analysis of this study is based on the experiments that have been carried out. In this study, four scenarios were used that used several combinations of methods. These scenarios aim to find out the importance of the combined methods in the MLP optimization process. Each scenario was carried out 18 times with different configurations of MLP hyperparameters to obtain the best results and the average AUC score. The results are shown in Table 6. The confusion matrix's values (TP, TN, FP, and FN) shown in Table 6 is the result of the confusion matrix from the experiment that produces the best AUC score. The best result is the result of one of the 18 experiments that have been carried out that produced the best AUC score. While the average result is the average value of the AUC score results in all 18 trials of each scenario.
Based on Table 6, it can be seen that the AUC score obtained from each scenario has increased. In scenario I, the model gives a very low AUC score of 0.5005. But in the confusion matrix shows that the model is successful in classifying almost all negatives with 4580 instances, but in the positive class, which is a minority class, the model only manages to classify as many as 10 instances. This shows that the model experiences a bias-to-majority which makes the model only sensitive to the majority class (non-default) and ignores the minority class (default).
In scenario II, the model managed to give a score of 0.6757 which is quite high compared to the results of the MLP baseline in scenario I, this shows that the ability of the algorithm is influenced by the range of the data used. Data with a normalized range in the range of 0 to 1 gives a score more than raw data which has a wide and different range of variations for each feature.
The results of the AUC score in scenario III were higher than scenarios I and II. This shows that the use of OHE in creating categorical nominal features has succeeded in making MLP perform better classification. Also seen in the confusion matrix, this scenario managed to correctly guess the positive class (TP) as many as 545, more than the two previous scenarios.
Scenario IV produces the highest AUC score compared to other scenarios. This shows that optimization with OHE and SVM-SMOTE has succeeded in increasing the performance of MLP. SVM-SMOTE provides solutions from imbalanced data that causes bias-to-majority by creating instances of a new synthetic training set, this makes MLP conduct a training with a balanced amount of data in both classes. This process makes the MLP sensitive to both classes and makes the AUC score higher. This scenario resulted in a score of 0.7184, an increase of 0.2179 when compared to scenario I. In addition, this scenario also managed to classify a positive class of 842 instances, the highest compared to other scenarios. Comparisons were made with previous studies using MLP for classification on the default of credit card client datasets. As shown in Table 7, the related studies used for comparison were studies (Vishwakarma et al., 2021) and (Souza and Torres, 2021). In the study (Vishwakarma et al., 2021) used several classifiers, which resulted in the MLP classifier score of 0.5000. Then in research (Souza and Torres, 2021), MLP was also used as a classification algorithm on the similar dataset which resulted in an AUC score of 0.6506. While the proposed method succeeded in producing the highest score of 0.7184.

Conclusion
Based on the results and discussion that have been described previously, the use of OHE as an encoding algorithm for categorical nominal features and SVM-SMOTE as an alternative to the oversampling process in the training dataset can be used to cope with imbalanced data on classification problems using MLP classifier. This study shows that these two techniques can be used to optimize the MLP algorithm and produce a highest AUC score of 0.7184, a significant increase by 0.2179 compared to an MLP without the optimization process.