Diagnosis Using Brain Tumors Two-Dimensional Principal Component Analysis (2D-PCA) with K-nearest Neighbor (KNN) Classification Algorithm

The rapid development of computer technology has brought more and more benefits to human life.

The rapid development of computer technology has brought more and more benefits to human life. Currently, computers can make decisions by imitating the human brain to be used in the health sector to play a role in solving existing problems. One of the technologies used is digital image processing technology on MRI images of brain tumors. Brain tumor images have various variations and large dimensions; therefore, an appropriate method is needed to recognize images maximally. Dimensional reduction uses the Two-Dimensional Principal Component Analysis (2DPCA) method. The classification process uses the K-Nearest Neighbor (KNN) method by calculating the euclidean distance (Euclidean Distance). From 3 tests with the number of data 200 images, the results of the accuracy of the 1st test were 90.0% with 60 test data and 140 training data, the second test was 85.0% with 80 test data and 120 training data, and the 3rd test is worth 83.0% with 100 test data and 100 training data. Based on the research above, it can be concluded that the highest accuracy is obtained in the 1st test, while the lowest accuracy is on the 3rd test. The more amount of training data compared to the test data, the greater the accuracy value obtained. This research is expected to be a reference for further research so that the results obtained are more optimal. complex and varies in intensity. The advantages of MRI include obtaining high-resolution images, and it is safe to apply to brain organs because it does not contain ionizing radiation. However, interpretation or reading of MRI images takes a long time. So that image segmentation needs to be done. Image segmentation aims to divide the tumor image and common areas (Balafar et al., 2010).
In image analysis applications, dimensional problems are commonplace, which may be a degradation factor in the performance of a given algorithm as the number of features increases (Turk & Pentland, 1991). Principle Component Analysis (PCA) is one of the most popular multivariate techniques for data reduction in image analysis, pattern recognition, and machine learning (Kaya et al., 2017). Principal component analysis (PCA), also known as the Karhanen-Loeve expansion, is a classic feature extraction and flat presentation technique widely used in pattern recognition and computer vision (Turk et al., 1991). 2D-PCA has two critical advantages over PCA. First, it is easier to evaluate the covariance matrices accurately. Second, less time is required to determine a suitable Eigenvector. The 2D-PCA model is compared to all other conventional image recognition models. It has many advantages over traditional PCA. Since 2D-PCA is based on an image matrix, it is easier to use for image feature extraction. 2D-PCA outperformed PCA, FDA, ICA, and KPCA in recognition accuracy in all trials (Senthilkumar & Gnanamurthy, 2016).
In the feature extraction process, the classification process is as important as the feature extraction process. After the essential features of the tumor are generated in the feature extraction process, these features will be used for the classification process. K-Nearest Neighbor (KNN) is an algorithm that is very easy (simple) to recognize but works very well. The application of this method is extensive from vision, DNA sequencing, computational geometry, data mining, and many others (Maheswari & Babu, 2015).

Two-dimensional Principal Component Analysis (2DPCA)
Computationally 2DPCA has better computation time performance than PCA because to get the covariance matrix. The 2DPCA method is directly obtained from the image matrix. There is no need to transform the matrix into a one-dimensional vector in the PCA method (Yang et al., 2004). The 2DPCA method has two significant advantages over the PCA method. First, it is easier to evaluate covariance matrices accurately. Second, less time is required to determine a suitable eigenvector (Oliveira et al., 2011).

Steps 2
Then the next stage is the calculation of the average of the total training set matrix, and the calculation can be seen in formula (1).
Steps 3 Then calculate the matrix difference from each image with Ā, the calculation can be seen in formula (2).
Steps 4 Furthermore, the covariance matrix of the training image set can be calculated, namely employing equations; the calculation can be seen in the formula (3).
or it can be written, Where: : covariance matrix M : image amount j : 1, 2, 3,....n Aj : Brain image matrix A : Difference matrix Ā : Transpose difference matrix Step 5 Determine the eigenvalue and eigenvector of the covariance matrix generated in step 4. To determine the eigenvalue and eigenvector, the Singular Value Decomposition (SVD) method is used. Mathematically it can be expressed as follows. The calculation can be seen in formula (4). Where: A : Square matrix (nxn) V : Eigenvector  : Skalar/Eigenvalue Eigenvalue always corresponds to the change in eigenvector, and then the eigenvector is projected according to the eigenvalue starting from the largest λ1 > λ2 > λ3 > ...> λn.

Classification
The concept of classification is assessing data objects to include them in a particular class from several available classes. In classification, there are two primary jobs carried out, namely (1) building a model as a prototype to be stored as memory and (2) using the model to introduce/classify/predict another data object so that it is known which class the data object is in the model. which he had saved (Prasetyo, 2012). The classification method also aims to map data into previously defined classes based on the data attribute value (Han, Kamber, & Pei, 2012).

K-Nearest Neighbor (KNN)
The K-Nearest Neighbor (KNN) method was first introduced in the early 1950s. The KNN classification method worked well when given extensive training data. However, this method became popular in the 1960s when there was an increase in computing power. Since then, this method has been globally used in pattern recognition (Han et al., 2012). The KNN classification is a simple, effective, nonparametric method, and this method has been widely used in text classification, pattern recognition, image and spatial classification, and other fields. KNN classification serves to find the closest point with the Euclidean distance formula (Sun, Du, & Shi, 2018). K-Nearest Neighbor (KNN) is an algorithm with reasonably high accuracy (Hidayah, Akhlis, & Sugiharti, 2017). KNN classification is done by comparing the distance between training data and testing data. When there is input testing data, KNN looks for the closest distance (Euclidean distance) testing data to available training data. The Euclidean distance matrix is used to determine the proximity of data points/distance between data in the K-Nearest Neighbor (Dhriti & Kaur, 2012).
The following is the function used to find the Euclidean distance [6]. The calculation can be seen in formula (5). Where: 1 : training data 2 : testing data i : data variable : distance p : data dimension

Brain Tumor Database Components
In this study, the Two-Dimensional Principal Component Analysis (2D-PCA) model in diagnosing brain tumors is applied in making the system using the Django framework with the Python programming language. This application requires data on people with brain tumors and not. The data is then processed and classified. The output of these tests is the level of accuracy in diagnosing brain tumors. The data used in this study is Magnetic Resonance Imaging (MRI). This data is a public dataset available on Kaggle https://www.kaggle.com/simeondee/brain-tumor-images-dataset/data. The MRI dataset consists of 2 attributes, tumor attributes and non-tumor attributes. One hundred tumor attributes and 100 non-tumor images can be seen in Figure 1. a. Image of brain tumor b. The image is not a brain tumor Figure 1. Image sample (a. Image of brain tumor, and b. The image is not a brain tumor) The dataset is taken from Kaggle before going through the feature extraction process through the data sharing process. The dataset is divided into two parts, namely training data and testing data. This data sharing is done manually by dividing the data into two parts: training data and testing data. Training data in this study will be used as experimental material, and testing data is used as data for testing with several compositions, as shown in Table 1.

Brain Tumor Database Components
The testing process was carried out four times. This test was carried out to determine the accuracy of the method used. The explanation of the testing process will be explained as follows.

Data Partition
Data distribution is done manually by dividing the data into two parts: training data and test data.
The training data will be used as experimental material, and the test data will be used as data for testing. The distribution of training data and test data can be seen in Table 1.

Input Test Image
After determining the database and sharing the data, the next step is testing. Tests are carried out on all test images contained in the database. At this stage, the user enters the test image into the system.

Feature Extraction
The feature extraction stage is carried out by changing the face image into a brain tumor matrix. In the 2DPCA method, the covariance matrix is obtained directly from the brain tumor image matrix, and there is no need to convert the matrix into a one-dimensional vector. The last stage of feature extraction looks for eigenbrain from each brain tumor image. To obtain eigenbrain, 2DPCA calculates the covariance matrix of training brain tumor images. The process of feature extraction for brain tumor images can be seen in Figure 2.

Classification
Classification is the process of matching the test image class and the training image. The K-Nearest Neighbor (KNN) method is used as a method in this study. The first step in this calcification is to find the eigenbrain value of the training image. Second, calculating the euclidean distance between the training image and the test image. The class that has the smallest euclidean distance is considered to have a lot in common with the test image. The K-Nearest Neighbor classification flow chart can be seen in Figure 3.

Accuracy Calculation
The process of calculating the accuracy is used to determine the level of accuracy of the system in recognizing facial image classes. The confusion matrix provides decisions obtained in training and testing, and the confusion matrix offers an assessment of classification performance based on actual or false objects.

Results Analysis
In this study, the authors conducted experiments three times with different amounts of training data and test data to determine the method's accuracy. The database consists of 2 classes, each class consisting of 100 images. The test results can be seen in Table 2. The brain tumor detection system was tested using Kaggle data. There are two classes where each class consists of 100 brain tumor images so that there are 200 brain tumor images. All brain tumor images will be divided into training data and test data as in Table 1. This file is in JPG format, and the size of each image is 200x200. The results of the brain tumor feature extraction are represented in the brain tumor matrix.
In the 2DPCA method, the covariance matrix is obtained directly from the brain tumor image matrix, and no transformation matrix into a one-dimensional vector is required. The last stage of feature extraction is to calculate the eigenbrain from each brain tumor image. The weight value of the eigenbrain results is used to identify the test image by finding the value of the distance weight from the test image with the training image. The most negligible distance weight value represents the training image, which is similar to the brain tumor test image. The last stage of the classification process is the accuracy of calculations and accuracy calculation using a confusion matrix. Calculation accuracy is used to evaluate the method's success by calculating the percentage accuracy of the process.
Based on Table 2. It can be seen that the highest accuracy with the 2DPCA + KNN method was obtained in the first test of 90.0% with 140 training images and 60 test images. And the lowest accuracy of the 2DPCA + KNN method was obtained in the 3rd test of 83.0% with 100 training images and 100 test images.

Conclusion
From the research results of image processing, design, manufacture, system testing to the results of the diagnosis of brain numbers using the 2DPCA method with the KNN classification, it can be concluded that feature extraction is used to obtain patterns/characteristics in the image which is then used as a reference to distinguish between one image and another. Extraction results feature brain tumors represented in the form of a brain tumor matrix. In the Two-Dimensional Principal Component Analysis (2DPCA) method, the covariance matrix is directly obtained from the brain tumor image matrix, and there is no need to transform the matrix into a one-dimensional vector. The last stage of feature extraction calculation looks for eigenbrain from each brain tumor. Calculation of the weight value distance is done by calculating the euclidean distance (Euclidean Distance). The distance between the smallest weight values represents the training image, which is similar to the test brain tumor image. The last stage of the classification process is the calculation of accuracy. The calculation of accuracy is used as a way to evaluate the success of a method by calculating the percentage of the accuracy of the method. The highest accuracy results with the method.