Application of Deep Learning Using Convolutional Neural Network (CNN) Method for Women’s Skin Classification

Purpose: Facial skin is the skin that protects the inner part of the face such as the eyes, nose, mouth, and other. The skin of the face consists of some type, among others, normal skin, oily skin, dry skin, and combination skin. This is a problem for women because it is hard to get to know and distinguish the type of peel, this is what causes some women’s, it is hard to determine which product cosmetology and proper care for her skin type. Methods: In this study, the method of the Convolutional Neural Network (CNN) is an appropriate method to classify the type of the skin of women of age 20 – 30 years by following a few stages using Python 3.5 with a depth of three layers. In this study, the method used CNN to distinguish the type of skin of the label object of the type of skin that a normal skin type, oily, dry and combination. A combination skin type is composed of normal and dry skin types. Result: The process of learning network CNN to get the results of the value by 67%. As for the classification of Normal skin 100%, the type of the skin of the face 100% Dry, kind of Oily facial skin 100% and combination skin type (Normal and Dry) to 100%. Novelty: It can be concluded that the use of the method of CNN in automatic object recognition in distinguishing the type of leather as a material consideration in determining the object of the image. And the classification method using CNN with the Python program to be able to classify well.


INTRODUCTION
Most people, especially women pay attention to good appearance in terms of fashion, lifestyle, and the most concern is face. Every woman has a different face shape, color, and texture depending on heredity, conditions in the body, and different climates. The most visible thing on the face from the naked eye is on the skin. Facial skin is skin that protects the inside of the face such as the eyes, nose, mouth, and others. To look perfect, many people are willing to spend a lot of money to maintain the health of their skin. Facial skin classified into several types including normal, combination, oil, dry, and sensitive skin [1].
For that, every woman must know the type of skin on her face. Most women do not understand the characteristics of their own facial skin type and when they want to do care and use make up which is supposed to support the appearance of these women, they experience a mismatch between the products used and the type of facial skin itself. Usually, this discrepancy caused by using the same facial makeup as his friend even though the skin type on his face is different.
The mismatch between the use of care products and facial makeup with this type of facial skin causes damage to facial skin such as dullness, allergies, minor irritation, enlarged pores, the appearance of black and white blackheads, and can also cause acne, therefore every woman should consult a beauty doctor who focuses on the face and skin in order to find out what skin type she has.
The face is the most important thing that often noticed by women, where they dream of a clean, nice and healthy face to support their appearance because the face is the first part that other people see [2]. Good and healthy facial skin can be had by doing facial treatments which can be classified according to the skin type. Usually, the type of facial skin that is commonly owned by every woman, there are two types, namely oily and dry skin where oily skin is seen in the pores of the skin on the cheeks that are clearly visible and the skin looks dull and sticky, while dry skin looks not moist, thin, and brittle on the cheeks [3].
Convolutional Neural Network is a deep learning method that is able to recognize and detect an object in a digital image. This is largely due to stronger computation factors, large datasets, and techniques to train deeper networks. This Convolutional Neural Network capability is claimed to be the best method in terms of detecting and recognizing objects, but it is the same as other deep learning methods which have weaknesses in the training process which requires a long time [4].
Fenti, Wawan, and Yaya have used the CNN method for facial recognition-based automatic attendance recording system which is one of the images that can be analyzed by this method. The results obtained from 2,400 face image data are divided into two: 1,200 face images see the camera and 1,200 face image data do not see the camera, namely an accuracy of 93.33% which depends on the conditions of input image capture, face detection, and classification process [5].
The purpose of this research on facial skin type classification is to analyze the facial skin type of each individual, to analyze how a woman's facial skin is treated based on facial skin type, and to prove that each woman's facial skin type is different. This classification research was carried out because every woman still has difficulty determining the type of facial skin and it is difficult to determine facial care and makeup products according to her facial skin type.

Research Methods Convolutional Neural Network
The method used in this research is Deep Learning Convolutional Neural Network (CNN). CNN is a type of neural network that has the advantage of having a very high level of accuracy in classifying. CNN divides its work into several layers in which there are three main layers, namely Convolutional layer, Pooling layer, and Full Connected layer. Convolutional layer is used to extract data features that will be used for training, then the pooling layer is used to create new filters based on the desired rules, and finally the fully connected layer is actually MLP (Multilayer Perceptron) which is part of an artificial neural network and consists of a number of neurons linked by connecting weights [6].
CNN is a variation of the Multilayer Perceptron inspired by human neural networks based on the findings of Hubel and Wiesel who conducted visual cortex research on cats' visual senses. This study is very inspired by the ways it works because in the results of research, the visual cortex in animals is very powerful in the visual processing system that has ever existed [7]. Figure 1 shows CNN method layer sequences. Deep Learning is a science in the field of Machine Learning that develops due to the development of GPU acceleration technology which has excellent capabilities in computer vision. One of its capabilities is in the case of object classification in images. To implement Deep Learning which can be used for object image classification, namely CNN. The way CNN works is similar to MLP, but in CNN, each neuron is presented in two dimensions. MLP accepts one-dimensional data input and propagates the data on the network to produce output. Each link between neurons in two adjacent layers has a one-dimensional weight parameter that determines the quality of the mode. At each input data layer, a linear operation is carried out with the existing weight values, then the computation results are transformed using a non-linear operation called the activation function. In CNN, the data propagated on the network is two-dimensional, so the linear operation and weight parameters on the CNN are different. In CNN, linear operations use convolutional operations, while the weight is no longer one-dimensional but four-dimensional which is a collection of convolutional kernels. A CNN consists of several types of layers, namely the Convolutional Layer, Subsampling Layer, and Fully Connected Layer.
The CNN method consists of two stages. The first stage is image classification using feedforward and the second stage is the learning stage using the backpropagation method. The feedforward process is the first stage that will produce several layers to classify image data which uses updated weights and biases from the backpropagation process. This stage will also be used again during the testing process. The backpropagation process is the second stage where the results of the feedforward process are traced from the output layer to the first layer. To indicate that the data has been traced, new weights and biases are obtained. Prior to classification, a preprocessing is done with the wrapping and cropping methods to focus on the object to be classified. Furthermore, training is carried out using the feedforward and backpropagation methods. Finally, the classification stage using the feedforward method with updated weights and biases. The test results from the classification of object images with different levels of confusion on the Caltech 101 database produce an average accuracy value of 20% -50%. So it can be concluded that the CNN method used is capable of performing a good classification [5].
Convolutional Neutral Network is used to divide images into multiple non-overlapping regions; the basis of object detection and classification is based on feature extraction, which is aimed at extracting the most effective essential features that reflect the target. Every aspect is closely related, so every effort should be made to achieve satisfactory results [8].
CNN which is hierarchical neural network possesses huge representational capacity that learns the good features at every layer of the visual hierarchy. It has also been effectively applied to a lot of vision problems like visual object recognition and handwriting recognition. These features are automatically extracted from the input image having the benefit of being invariant to the shift and shape distortions of the input textual images. CNN includes a number of convolutional and sub-sampling layers which are optionally accompanied by Fully Connected Layers (FCL). The FCL are uniform to the layers in a standard Multi-Layer Perceptron. Yet, MLP offer two borders in classification tasks: To begin with, there is absence of theoretical relationship between the classification task and the MLP structure. Next, MLP drift hyper-planes separation surfaces, in feature representation space, that are not optimal in terms of margin between the examples of two different classes.
Convolutional Neural Networks are composed of an automatic feature extractor and a trainable classifier. CNN are exploited to learn complex, high dimensional data, and differ in how convolutional and subsampling layers are inquired into. The difference is in their architecture. Many CNN architectures are suggested for different problems among which object recognition and handwriting digit/character recognition. The best performance on pattern recognition task was achieved. In addition, to guarantee some degree of invariance to scale, shift and distortion, CNN mix three main hierarchical aspects such as local receptive fields, weight sharing and spatial sub-sampling [9].
CNNs that try to simulate biological aspects of human beings on computers required pre-processing of images or data before feeding them to the network. When the ConvNet was first invented, however, it was described as a neural network that requires minimal pre-processing of images before feeding them to the network, and a system that is capable of extracting the features from images to optimize the learning performance of the neural network. The ConvNet comprises both feature extraction and classification phases in a single network. A traditional ConvNet consists of three layers: convolution, pooling, and fully connected layers. Feature extraction is performed in the convolutional layer by applying masks, which is the process of dividing images into a predefined dimension of segments and using filters to extract features from the image. Then a feature map, which is the projection of features on the 2D map, is created by applying an activation function to the values obtained by the masks. The activation function activates the most knowledgeable neurons in a nonlinear way and reduces the computational cost of the neural network.
Several activation functions are available in CNNs, and the rectified linear unit (ReLU) is the most commonly used activation function; it does not activate all the neurons at the same time, and therefore provides a faster convergence when the weights find the optimal values to produce the trained response during the training. A pooling operation is performed on the produced feature map to reduce the dimensions of the images. Finally, the feature map is flattened into a vector and sent to the fully connected layer. The convergence of the neural network and the classification of the input patterns are performed in the fully connected layer, and its principles are based on error backpropagation to update the weights within this layer. Deep ConvNets were applied in several image recognition applications with high accuracy, and this increased its reliability for future research [10].
Convolutional neural networks are deep artificial neural networks. We can use it to classify images (e.g., name what they see), cluster them by similarity (photo search) and perform object recognition within scenes. It can be used to identify faces, individuals, street signs, tumors, platypuses and many other aspects of visual data. The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or kernels) which have a small receptive field but extend through the full depth of the input volume. During the forward pass, each filter is convolved across the width and height of the input volume, computing the dot product, and producing a 2-dimensional activation map of that filter. As a result, the network learns when they see some specific type of feature at some spatial position in the input. Then the activation maps are fed into a down sampling layer, and like convolutions, this method is applied one patch at a time. CNN has also fully connected layer that classifies output with one label per node [11].
Convolution Neural Network (CNN), often called ConvNet, has deep feed-forward architecture and has astonishing ability to generalize in a better way as compared to networks with fully connected layers describes CNN as the concept of hierarchical feature detectors in biologically inspired manner. It can learn highly abstract features and can identify objects efficiently. The considerable reasons why CNN is considered above other classical models are as follows. First, the key interest for applying CNN lies in the idea of using concept of weight sharing, due to which the number of parameters that needs training is substantially reduced, resulting in improved generalization. Due to lesser parameters, CNN can be trained smoothly and does not suffer overfitting. Secondly, the classification stage is incorporated with feature extraction stage, both uses learning process. Thirdly, it is much difficult to implement large networks using general models of artificial neural network (ANN) than implementing in CNN. CNNs are widely being used in various domains due to their remarkable performance such as image classification, object detection, face detection, speech recognition, vehicle recognition, diabetic retinopathy, facial expression recognition and many more.
The motivation of this study is to establish a theoretical framework while adding to the knowledge and understanding about CNN. The purpose of this study is to present the amalgamation of the elementary principles of CNN and providing the details about the general model, three most common architectures and learning algorithms. A new learning technique, ADAM proposed by has also been elucidated. In addition to that, it computes learning rate for every individual parameter. The complete structure of the sections is as follows. Section 1 gives the introduction and provides the purpose of the study. Section 2 describes the general model of CNN along with all its elementary concepts. Section 3 introduces various architectures of CNN. Section 4 portrays the learning algorithm and Section 5 illustrates the conclusion and future scope [12].
Convolutional neural network (CNN) is a class of feedforward neural networks including convolution computation and deep structure. It is one of the representative algorithms of deep learning . CNN has been widely used in many fields. Particularly in the field of recognition, CNN is used for handwritten digit recognition, speech recognition, facial expression recognition, human face recognition, refrigerator fruit and vegetable recognition, verification code recognition, traffic sign classification and recognition, and so on. In the field of image recognition, the images can be directly made the input of CNN, which reduces the complexity of the experiment. Further more, the image information can be passed through the forward propagation to the convolution layer and the downsampling layer by CNN. At the same time, it can be handled in the different network layers, which avoids the extraction process of the complex features in the traditional algorithms.
The most basic features of the images can be accessed by the neuron in the local sensing domain of CNN. CNN still remains a high degree of invariance in extracting complex features of images, regardless of shift, scaling, deformation, rotation, or other forms of deformation of the image. In recent years, because machine learning does not need to change the topological structure of the images, it is very popular in image recognition. Convolution neural network (CNN) is not only one of the deep learning but also one of the artificial neural networks, which mainly is used in the fields of speech analysis and image recognition [13].
CNN is a type of deep learning model for processing data that has a grid pattern, such as images, which is inspired by the organization of animal visual cortex and designed to automatically and adaptively learn spatial hierarchies of features, from low-to high-level patterns. CNN is a mathematical construct that is typically composed of three types of layers (or building blocks): convolution, pooling, and fully connected layers. The first two, convolution and pooling layers, perform feature extraction, whereas the third, a fully connected layer, maps the extracted features into final output, such as classification. A convolution layer plays a key role in CNN, which is composed of a stack of mathematical operations, such as convolution, a specialized type of linear operation. In digital images, pixel values are stored in a two-dimensional (2D) grid, i.e., an array of numbers, and a small grid of parameters called kernel, an optimizable feature extractor, is applied at each image position, which makes CNNs highly efficient for image processing, since a feature may occur anywhere in the image. As one layer feeds its output into the next layer, extracted features can hierarchically and progressively become more complex. The process of optimizing parameters such as kernels is called training, which is performed so as to minimize the difference between outputs and ground truth labels through an optimization algorithm called backpropagation and gradient descent, among others [14].
In order to compensate for the shortcomings of these multilayer neural networks, a new neural network structure called Convolutional Neural Network (CNN) has been proposed. CNN is a neural network structure proposed in the 1990s. It is a machine learning model based on. CNN is a convolution layer, pooling It consists of several levels of layers, fully-connected layers. Among them the convolution layer and the pooling layer extract features of the input data. And the fully-connected layer classifies the previously extracted features. Plays a role. Unlike other algorithms, the CNN process has features since the step of extracting is included inside, the characteristics of the input data the advantage of direct calculation is possible without a separate pre-processing process to extract.
Also, because it learns using the features of the data, it shows good performance in feature extraction and classification, and if you use the dropout technique it can also prevent overfitting problems, so it is widely used in the field of image recognition. It is being used when determining the structure of the CNN, the size and stride of the kernel, and the number of channels the hyperparameter of the back should be set. These parameters are CNN Not only determines the overall structure of the system, but also determines the learning time, accuracy, etc. Because it directly affects performance, it does not fall into overfitting problem. Optimizing hyperparameters to achieve the expected results the work must be preceded. But so far, the hyperparameter It is common that there are no rules for optimizing and hyperparameters when making a decision, much of it must be made based on experience or the designer's intuition [15].
The CNN is one of the most widely used deep learning algorithms, having recently gained much interest in the field of image processing. The CNN is superior to other DNN algorithms due to its ability to preserve the spatial information by maintaining the interconnection between pixels. The CNN is one type of feedforward neural network in which an input passes through one or multiple layers of ''neurons.'' Each neuron represents a linear combination of inputs that passes through a typically nonlinear function, called the activation layer, and then passes to the next layer. The model can then be trained with a backpropagation algorithm. The goal of training is to update sets of weight matrices and bias vectors to minimize the loss function, that is, the distance between the estimation and observation. A CNN network is typically constructed with one or more convolution layers and pooling layers [16].

Python
Python is one of the programming languages that are popular in the world of work, especially in Indonesia and in academic fields that are suitable for the use of the Python programming language, namely computation science, robotics, data science, economics, space, and various other fields. By default, Python is included in several Linux-based operating systems such as Ubuntu, Linux Mint, and Fedora. Python is a multipurpose interpretive programming language developed by Guido van Rossum in 1990 at CWI, Amsterdam as a continuation of the ABC programming language. This programming language has a design philosophy that focuses on the level of code readability and it was claimed to be a language that combines capabilities, capabilities, with very clear code syntax, and is equipped with a large and comprehensive standard library functionality.
Python can be used for various software development purposes and can run on various operating system platforms such as Linux / Unix, Windows, Mac OS X, Java Virtual Machine, OS / 2, Amiga, Palm, and Symbian (for Nokia products) [17].
Python has a license that does not contradict either the Open Source definition and the General Public License (GPL) so that Python can be used for free, even for commercial purposes. Python has a data structure is a high level of efficiency. The approach used in Python also simple but very effective for objectoriented programming. Python also has the syntax of a dynamic and elegant [18].
Python provides strong support for integration with other programming languages and tools help others. Python comes with libraries of standards that can be expanded and can be learned in just a few days. Programming language interpretive multipurpose with the philosophy of the design focuses on the level of readability of the code [19].
This study uses the approach of python to take the source code which is there on CNN. So it can be seen the value of the accuracy of each type of skin. So the testing data and training studied to produce the value of good accuracy.

Mathematical Equations
CNN generally consists of three layers, namely the convolutional layer, the subsample or pooling layer, and the fully connected layer. The convolutional layer shares a lot of weight while the pooling layer performs a subsampling function to produce output from the convolutional layer and reduce data rates from the layers below it. The output from the pooling layer is used as input to several fully connected layers. Convolutional features, which are called feature maps, are the result of a convolutional filter or kernel on the dataset. The convolution process can be written with the following equation: Where I is the input image, K is the kernel or filter used in the convolution process, m is a series of images, and n is the image column. Subsample or pooling is the process of reducing feature maps. The concept of the pooling process is almost the same as the convolution process, namely the convolution filter on the input data. However, the difference between the pooling process in the shifting filter does not overlap in each filter compared to the convolution process [18].

RESULT AND DISCUSSION
This study used data taken from primary data then computed using the Convolutional Neural Network (CNN) method and for the primary data itself as many as 20 data consisting of 12 training data and 8 validate data which have been divided into several types of skin, namely normal skin, oily skin, dry skin, and combination skin.
The research conducted by this writer prepared several materials and tools that were used as a support, among others: 1. Cell phones from each source for taking pictures as a dataset as shows in Table1. 2. Laptops as a medium for processing datasets with the following specifications as shows in Table 2. 3. Software Anaconda3 v5.3.0 as an intermediary to access the editor that researchers use, namely Spyder (Python 3.5).
The research that the author conducted used observation and interview techniques to the sources we were looking for. The observation technique that the researchers conducted was in the form of taking pictures (photos) of 20 (twenty) female informants from the age of 20-30 years who also included the author and researcher using their respective cellphones with full faces visible. The results of the photos will be sent via the WhatsApp chat application to the researcher and one person sends 1 to 2 images. While the interview technique was carried out online through the WhatsApp chat application in the form of several questions about the condition of the informant's facial skin and the effects of this condition. The questions that the researcher ask are 3 to 5 questions sequentially and completely. The stages of this research can be seen in Figure 2.  Samples that have been collected are used as a dataset where the researchers set the image size to 640 x 480 pixels to generalize because each brand and type of cellphone has a different length and width of the camera. Researchers do cropping through applications outside of programming and the results of the cropping are divided into several skin types according to the results of the interview via chat and change the original color to grayscale color. Researchers divided this dataset into training data and validate data.

Preprocessing
The second stage was preprocessing the image of a woman's face which had been divided into facial skin types. This preprocessing is carried out because the pixel size is large enough that it affects the training process carried out by the "computer" starting from reducing the amount of work on the "computer" itself, accelerating the processing time during the process to increasing the accuracy of the training process. Researchers grayscale and resize through Python 3.5 programming at 32 x 32 pixels in each image as describes in Table 3 below.

CNN Method Training (Feature Extraction and Classification)
The process of data collection is done by collecting photographs of women as much as 20 data. Then from the data included in 1 folder. From the folder in the resize to the size that can be entered into the model, CNN. The Python Program takes source code from CNN, which is already in the encryption. So that the Python program can produce the value of its accuracy.
The next stage is the extraction and classification stage using the Convolutional Neural Network method which consists of 12 training data and 8 validate data. In the CNN method as previously discussed, it consists of several layers, namely the proposed convolutional layer has two convolution layers with a kernel size (3,3) and after that the activation function "relu" or Retrified Linear Unit is added which is useful for eliminating negative values to 0 on the resulting convolution matrix.
After the convolution layer is complete, continue to the pooling layer which aims to reduce the size of the image taken from the convolution layer image using max-pooling with region (2,2) to obtain a new matrix, then then enter the flatten layer where the pooling data that is owned is 2-dimensional array and then converted into one-dimensional data.
And finally, the Fully Connected Layer, which means that each neuron in the previous layer is connected to every neuron in the next layer. In this layer there is a dense with the sigmoid activation function which aims to determine the class classification of the image based on the value of the neurons.

The Analysis
Results of the research process for the classification of women's skin types provide a fairly high accuracy value by having a dataset of 20 images with a comparison of their use into 12 training data and 8 validate data. This CNN model was run as many as 150 epochs using the Keras library in Python programming with the tensorflow back-end. Based on the trials in this study, the comparison for the accuracy value obtained was 67% and the loss value reached -9.2992. This shows that the results of the classification of women's facial skin types are good but still need to be reviewed with the addition of more samples so that the results of the accuracy value can be greater.

CONCLUSION
From the results of research conducted that the skin type of women in Indonesia tends to experience oiliness due to environmental and genetic factors (heredity). Apart from that from this study, what affects the final result is the time vulnerability before taking the picture and also the lighting taken during the shooting.In addition, classification research using the Convolutional Neural Network method produces good accuracy values but needs to be reviewed due to the limited number of samples collected. Taking from the theory of the CNN method itself, most of it was influenced by large datasets, while for this study the dataset collected was still small.