Face Identification Based on K-Nearest Neighbor

At this moment, face identification has been widely applied for security on gadgets, smart home security, and others. Face dominates as a biometric which is most increase in the next few years. Face is used for biometric identification which is considered successful among several other types of biometrics and accurate results. Face recognition utilizes facial features for security purposes. The classification method in this paper is K-nearest Neighbor (KNN). The K-Nearest Neighbor algorithm uses neighborhood classification as the predictive value of a good instance value. K-NN includes an instance-based learning group. This paper developed face identification using Principal Component Analysis (PCA). he preprocesing method that used in this reseach is contrast stretching, grayscale and haar cascade segmentation. This research is registered 30 people, each person had 3 images used for training and 2 images used for testing. The result obtained from several test of k value gives the best accuracy is 81% with k=1.


INTRODUCTION
Along with the rapid development of technology, it needs to be balanced with appropriate security enhancements so that users are comfortable with the personal information on the technology. Face is one of the most popular types of biometrics at the moment. The face dominates as biometrics which is most increase in the next few years, which is 38%, followed by multimodal (22%), iris (11%), and fingerprint 9% [1].
Facial biometrics is a personal identification system that is very different from one person to another. This difference makes face as a biometric that is widely applied besides is easily acquired. Research related to face identification had previously been carried out by Changxing Ding, Chang Xu, and Dacheng Tao with his research entitled Multi-task Pose-Invariant Face Recognition. This paper of shooting faces in unrestricted environments usually contains significant variations in poses, which dramatically reduce the performance of algorithms designed to recognize the front face. This study developed a face verification algorithm for the variations in significant facial poses. The best results were obtained using the High-dim LBP and Joint Bayessian methods with an accuracy of 93.18% [2].
The study entitled Real-Time Face Detection and Recognition in Complex Back-ground has researched facial biometrics. This study developed an algorithm for real-time face detection and recognition with a complex background that is efficient and resilient. Ada Boost, cascade classifier, Local Binary Patent (LBP), Haar-like features, face image pre-processing and Principal Component Analysis (PCA) are a series of signal processing methods. The PCA algorithm is used to recognize faces efficiently. This rhythm reaches 99.2% for correct facial recognition and a true positive level of 98.8% for face detection [3].
There are three stages carried out in this study to identify face, that is face detection, feature extraction and classification. Face detection is a face recognition to find the position of the face from an image that will be extracted later. Face detection displays the location of all faces in the input image given, usually in the form of a box divider. Feature extraction is a step to determine the natural characteristics of a face which will then be classified or recognized. Whereas, classification is the process of matching input with data in a database [4].
The face is one of the biometrics that is very easily acquired, namely using a camera. This paper develops face identification using the K-Nearest Neighbor classification method and using feature extraction Principal Component Analysis (PCA). The classification method used is K-nearest Neighbor (KNN). This paper will produce a program using the python programming language, which is used for the purpose of identifying faces.

RELATED WORK
The application of face identification using the PCA (Principal Component Analy-sis) method as feature extraction and KNN as a classification has been done before, so that it becomes a reference for making a system that can do more activities than the previous system.
One of the studies that has been conducted is stated in the journal entitled "Multi-Faces Recognition Process Using Haar Cascades and Eigenfaces Methods". The proposed facial recognition process is carried out using the Haar Cascades and Eigenface method hybrid process, which can detect many faces (55 faces) in one detection process. The image pre-processing process consists of several stages, namely training data, grayscale conversion, and preprocessing with Haar Cascade. The feature extraction process consists of two stages, namely the stages of training and testing, where the data being tested will be divided into two namely training data and testing data which will be processed using the PCA (Principal Component Analysis) method, better known as eigenfaces method. The face identification process uses the Euclidean Distance similarity method. This enhanced face recognition approach is able to recognize many faces with an accuracy rate of 91.67 [5].
Other research examples are listed in the journal entitled "Handwriting Recognition using Eccentricity and Metric Feature Extraction based on K-Nearest Neighbors". This journal proposed a recognition process that consists of several stages such as thresholding, noise removal, and cropping before feature extraction and classification. The dataset will be divided into training data and testing data. The feature extraction method used is eccentricity and metrics. Eccentricity is obtained between the value determining between the small elliptical focal distance and the main focus of the ellipse of an object. While the metric is the ratio between the area and circumference of the object. For the classification used the KNN method is used to classify objects based on training data with the distance nearest to the object, where the formula for calculating the distance used in this paper is the Euclidean distance formula. Based on the results of the testing obtained accuracy of 85.38% for the Handwriting Recognition using Eccentricity and Metric Feature Extraction based on K-Nearest Neighbors [6].
Examples of other studies are listed in the title "Voice Recognition using K Nearest Neighbor and Double Distance Method". This journal developed a new method to improve the accuracy of using data outliers, namely double distance method. This doubled distance method will be combined with the KNN method with k=1 as the center of the voice recognition. Frame work consist two stages are training and testing process. The training process is feature extraction using Mel Frequency Cepstrum Coefficients (MFCC). While the testing process through the introduction stage using the KNN method. Testing process is divided into two parts, the first part used the KNN method and the second used the doubled distance method. The similarity between testing data and training data is calculated by the Euclidean distance formula. Based on the results of testing, the method of KNN with one data center is 84.85% and the accuracy of the doubled distance method is 96.97%. From the result, we know double distance method improve the accuracy of voice recognition [7].

METHODS
Face identification Using the K-Nearest Neighbor Method consists of two phases namely the training phase and the testing phase. The dataset used in the training phase are 790 images consisting of 158 classes with each class consisting of three training images and two for testing images. Figure 1 shows the training phase in detail. While the testing or testing phase involves a database of features that have been obtained from the results of training. Figure 2 shows the testing phase in detail.

Data Source
Data source used comes from manual shoots. This dataset research contains 790 faces from 158 people taken from several angles. The dataset has not gone through the segmentation process so that there is still a background that is quite significant outside the face object. The dataset has been separated into each folder containing the name of the individual owner of the face.

Image Enhancement
Image Enhancement is an accentuation or sharpening of the elements of an image such as edge and boundaries or contrast levels that can make the graphic display of the image more useful for analysis and display [8]. Stages of image enhancement in the face identification system using the K-Nearest Neighbor (KNN) method consists of ROI (Region of Image), image conversion to grayscale and contrast stretching. An image enhancement technique increase contrast in the image by stretching called Contrast stretching. The concept of contrast stretching is to maintain the range of values of intensity it contains to reach the desired range of values. This is used to enhance the information in the image and maintain other details [9]. The following is the image enhancement result of face identification research using the KNN method.

Feature Extraction
Feature extraction is a stage to find the characteristic features of an inputted image. Feature extraction is a process which is extracted features to encourage the classifier to make decisions when classifying. The Principal Component Analysis feature extraction method used in this paper is one of the popular extraction methods [10]. The PCA method reduces the dimensions of data with the least amount of information loss [11]. This method is used in many fields, such as biometrics, feature extraction, image processing, data compression, etc. In the PCA method, faces are described as linear combinations of eigenvector weights called Eigenfaces. This eigenvector is a covariance matrix from the image database. The number of images in the database will be the same as the number of Eigenfaces received [12]. Furthermore, is an example of feature extraction using the face identification system using the Principal Component Analysis (PCA) method.

KNN Classification
Face classification is a stage for the process of matching testing data and training data from face datasets. KNN is one of the simple algorithms that can be used for classification. Regardless of its simplicity, this method is quite effective as a classification. This method was first proposed by T. M. Cover and P. E. Hart in 1967 [13] but then modified to improve the performance of the KNN. The basic concept of KNN is to have several training samples and testing samples determined by members. If k=1, the testing sample is assigned to the nearest single neighbor class. However, finding the right k value for a particular problem is a problem that affects the performance of the KNN [14]. The classification stages in face identification systems use the K-Nearest Neighbor (KNN) method where the eigen image of the feature extraction process used as input is as follows.

RESULT AND DISCUSSION
Tests were carried out using face datasets manually for research material. The following is a face identification system test using the K-Nearest Neighbor method with a change in parameter k value which is neighbor points of each class.  Table 1 is a testing process used the K-Nearest Neighbor (KNN) method with parameters k=1 using face training data with 30 classes consist of 2 testing data and 3 training data, which obtained an accuracy or F1-score of 81% . All persons can be identified but there are a few testing data cannot be identified. We suspect that the testing data face position is not so clear therefore it is not identified.  Table 2 a testing process used the K-Nearest Neighbor (KNN) method with parameters k=2 using face training data with 30 classes, which obtained accuracy or F1-score of 53%. There are seven people who cannot be identified at all with a zero F1-score. The zero F1-score obtained because of the precision and recall is also scored zero, that means there is no neighbor who close by can be a recommendation for the identification.  Table 3 a testing process used the K-Nearest Neighbor (KNN) method with parameters k=3, using face training data with 30 classes where accuracy or F1score is 47%.
From the results of research by applying the facial identification system using K-Nearest Neighbor (KNN) or eigenface divided into two parts, namely the training process and testing process. Testing scenario in this research by using parameter k=1, k=2, and k=3, give the best accuracy is on k=1 with 81%, while k=2 give 53%, and k=3 is 47%. It shows the value of k affected to the accuracy. From this research we can tell that the higher k value is the smaller accuracy that we get for face identification using KNN.

CONCLUSION
In this paper we have presented an experiment for face identification using the KNN method. KNN is one of the simplest algorithms that can be used for classification. The sources of data come from manual shooting which divided into 30 classes. The face identification using the KNN method consists of two stages, such as the training phase and the testing phase. Based on the results by changing the parameter k value obtained results are different for each parameter. The results give accuracy 81% for k=1, give accuracy 53% for k=2, give accuracy 47% for k=3. From the results, it shows that the value of k greatly affects the level of accuracy of the system. The parameter k value and accuracy are inversely proportional, the greater k value gives the smaller accuracy of the identification system. From this research can be concluded that the higher k value is the smaller accuracy that we get for face identification using KNN.