Chaotic Whale Optimization Algorithm in Hyperparameter Selection in Convolutional Neural Network Algorithm

ABSTRACT


Introduction
Convolutional Neural Network (CNN) is a group of Deep Neural Networks to complete computer vision tasks (Buda et al., 2018;Y. Wang et al., 2019). The method can process large-scale data and build models of hierarchical data with complex distribution (Xie & Tian, 2019). With this ability, in the last few years research (2014; 2016; 2017) on topics other than computer vision, CNN is applied as a method of solving problems on that topic. Accordingly, the new CNN architecture also completed the task in several research topics and was more accurate when compared to the old architecture (Chen & Liu, 2019;Conneau et al., 2016;Qian et al., 2016). Plus, the support for Graphics Processing Unit (GPU) technology makes it easy to quickly implement the new CNN architecture on big data (Zhu et al., 2017).
The development of CNN architecture leads to deep network design, for example, AlexNet (Hinton et al., 2012) has a deeper architecture than LeNet (Lecun et al., 1998), as well as VGG (Simonyan & Zisserman, 2014), and GoogleNet (Szegedy et al., 2015). The advantage of designing a deep network architecture is to get a better feature representation, but it makes the network complex, difficult to optimize, and easy to overfit (Zhang et al., 2019). Moreover, the selection of hyperparameters from CNN as well as in Neural Network (NN) also affects the results obtained. In connection with that, in designing and using CNN, knowledge, experience, and expertise in the data field are required for optimal results (Bibaeva, 2018;B. Wang et al., 2018).
In 2018 to solve this problem, researchers Lee et al. (2018) et al applied a parameter-settings-free harmony search algorithm to find the optimal hyperparameters of the CNN method. In this study, the architecture is fixed and the hyperparameter search is limited to setting the convolutional layer and the pooling layer. Then, Strumberger et al (2019) in their research proposed the Tree Growth Algorithm (TGA) method to search for hyperparameters from CNN. In this study, hyperparameters focused only on managing the structure and architecture of the CNN method. As a result, the research method has the smallest error value compared to several state-of-the-art methods. Furthermore, Sun et al (2020) in their research developed an automatic approach to developing the architecture and initialization of the weights of the CNN method for image classification problems. This approach uses an optimized Genetic Algorithm (GA) method. This research focuses on searching for architectural hyperparameters and initializing the CNN method weights. As a result of the research, the optimized Genetic Algorithm (GA) method obtained superior results in several datasets when tested using several state-of-the-art methods. Next, researchers Bacanin et al (2020) proposed two methods for selecting hyperparameters from CNN, namely Tree Growth with increased exploitation capabilities and Firefly algorithms with increased exploitation and exploration capabilities. In this research, hyperparameter search focuses on the number of convolutional layers, the number of filter convolutional layers, the size of the filter convolutional layer, the number of classification layers, and the number of hidden units of the classification layer. The results of the study, the method used in the study got the smallest error value. After that, researchers Ayla Gülcü & Zeki Kuş (2020) used a modified Microcanonical Optimization Algorithm (MOA) to select optimal hyperparameters on CNN. The results of this study were obtained using two approaches using constraints and not using limitations on hyperparameter searches. Then, the modified MOA method obtained the highest accuracy results in most of the datasets used in this study.
Some of the research results only focus on searching for hyperparameter network architecture, network structure, and initializing network weights on the CNN method. It makes previous research has limitations by not adding regularization and performing hyperparameter search of regularization on CNN. Therefore, this study focuses on searching for hyperparameter network architecture, network structure, regularization used in the CNN method network, and searching using different methods. The purpose of this research is to build a framework along with knowing the level of accuracy and error of the CNN method hyperparameter search automation using the Chaotic Whale Optimization Algorithm (CWOA) method which is included in the metaheuristic intelligence swarm class.

The Proposed Method/Algorithm
The selection of optimal hyper-parameters on the CNN) has an impact on the results obtained. This is evidenced by the deeper development of architecture. With the deeper architectural design, you will get the advantages of better feature representation, but it will have an impact on network complexity and easy overfitting (Zhang et al., 2019). Plus using and designing the CNN architecture to solve problems that require knowledge and experience in the data field (Bibaeva, 2018;B. Wang et al., 2018). Researchers in overcoming these problems propose the CWOA method for automating the selection of hyperparameters from the CNN. The stages of the proposed method are in the form of a flowchart in Figure 1.

Convolutional Neural Network (CNN)
In the CNN method, there are three components used as the main layer in the learning process (Alom et al., 2018), including the Convolutional Layer, Sub-Sampling Layer, and Classification Layer. Convolutional Layer is a feature retrieval process that is useful for detecting certain patterns in the image. The way the process works is by changing the value of the image into the form of a filter matrix that is used during learning. The filter is used to detect the presence of certain features or patterns in the image. Furthermore, the Sub-sampling Layer is a layer that has a function as down sampling on the input feature maps from the previous layer. Next, the Classification Layer is the last layer of CNN for the classification process of features that have been processed in the previous layer.

Regularization CNN with Dropout and Early Stopping
In the CNN training method, one way to speed up the training process, avoiding overfitting, is to use regularization. Therefore, in this study, to speed up the training process, two regularizations were used, it called Dropout and Early Stopping. According to Prechelt (2012) in Early Stopping the learning process of the CNN method was stopped when the error from the data validation increased. Then, the network weights use the value before being stopped during the learning process is stopped. In addition to using early stopping regularization, this study also uses Dropout regularization. The Dropout function avoids overfitting. This is evidenced in the study of Wu & Gu (2015) on the use of dropout in the CNN method and resulted in improved performance of CNN on several datasets compared to some state-of-the-art methods at that time.

Whale Optimization Algorithm (WOA)
In their research, Mirjalili & Lewis (2016) proposed the WOA as one of the methods belonging to the Metaheuristic group. The idea for this method came from observing the behavior of humpback whales in hunting prey. Described in the study, the working process of the WOA method consists of several stages, it is Encircling Prey, Bubble-net Attacking Method, and Search for Prey. Each of these stages represents the process of humpback whales in searching for prey. Then, from this study, it was found that the WOA method is quite competitive in balancing exploration and exploitation abilities when compared to several algorithms, such as PSO, GSA, DE, FEP. Then, for more details, the pseudocode of WOA can be seen in Table 1.
Update the current search agent position with the equation below

Chaotic Whale Optimization Algorithm (CWOA)
The CWOA is a development of WOA using Chaos theory. The combination resulted in a significant performance increase in the exploration and exploitation capabilities of WOA (Kaur & Arora, 2018). By increasing this capability, avoids premature convergence in the local optimal solution and does not waste time in finding the global optimal solution in the metaheuristic method (Gharehchopogh & Gholizadeh, 2019). The increase in this ability is influenced by Chaos theory. Chaos theory is a deterministic equation that produces random motion (W. Z. Sun & Wang, 2017). Chaos theory in WOA in the form of chaotic maps. The list of chaotic maps used in this study can be seen in the Table 2.

Overview Modification of Parameter Settings Free Harmony Search (PSFHS) Algorithm
The Harmony Search is a metaheuristic algorithm inspired by imitating artificial phenomena from the analogy of musical improvisation performance by musicians so that harmony becomes good (Geem et al., 2001). Having a simple concept, fast convergence, parameters that are not too difficult to set, simple models and concepts are the advantages of the HS algorithm (W. Sun & Chang, 2015). Then, research by Geem & Sim (2010) modified the method by adding steps to minimize the process of searching for the best parameters manually. The added stages are three, namely random tuning, rehearsal, and performance. Then, continued in the research by Lee et al. (2018) modified the performance stage to solve the CNN hyperparameter problem on the MNIST and CIFAR-10 datasets.

Research Dataset
This study using the MNIST dataset with a setting like this study (Gulcu & Kus, 2020). The distribution of data is done randomly with the amount of data taken 50% of the total training data in the dataset. Then, the data is divided into two more parts with a size of 10% taken for validation data and the rest for training data.

Hyperparameter Limitations and Architecture Setup
To reduce large computational resources, this research requires a search space limitation to find the optimal hyperparameter value from CNN. The limitations of this study are modified from research (Gulcu & Kus, 2020) and can be seen in Table 3. In addition, some hyperparameters are fixed and do not change. These hyperparameters can be seen in Table 4. of two blocks and a minimum of one block, while the initial initialization architecture is randomly generated. In addition, the last layer of the model does not use a Fully connected Block but a predefined static layer.

Fitness and Parameter Settings
The fitness evaluation aims to determine the best agent owned by the population from the CWOA, WOA, and modification of PSFHS methods. The criteria for the agent must have the smallest error value of all agents in the population. The error value comes from the loss function of the CNN method during training, namely Categorical Cross Entropy. Furthermore, the chaotic map initialization value has a value of 0.7 in the CWOA method, while the HMCR value is 0.5 and the PAR value is 0.5 for modification of PSFHS.

Research Methodology
The selection of optimal hyperparameters on CNN has an impact on the results obtained. Therefore, this study uses the CWOA method to automate the selection of CNN hyperparameters and the stages in the research can be seen in Figure 2.

Comparison CWOA, WOA, and Modification of PSFHS
In this study, the results of the CNN hyperparameter search from the method used by the researcher will be compared with the modified PSFHS method and the WOA method with the same test parameters. Then, the results of the error and the accuracy of these methods can be seen in Table 5. In Table 5 the MNIST dataset model CWOA method with Chaotic Map 6 has an error value of 0.023 and an accuracy of 99.63, then, the WOA method has an accuracy value of 99.36, and an error value of 0.027, while the modified method from PSFHS has an accuracy value of 99.19, and the error of 0.035. Seeing these results, the error value of the CWOA method with Chaotic Map 6 obtained the smallest error value and the largest accuracy value. Furthermore, in the FashionMNIST dataset model in Table 6, the CWOA method with Chaotic Map 8 obtains an error value of 0.23 and an accuracy of 91.36, then, the WOA method has an error value of 0.24 and an accuracy of 91.00, while the modification of PSFHS has an error value of 0.25 and an accuracy of 90.96. Seeing these results, the error value of the CWOA method with Chaotic Map 8 obtained the smallest error value and the largest accuracy value.

Comparison Method State-of-the-art
In addition to comparing the test results between the three methods used in this study to obtain good quality test results, it is necessary to compare the results obtained with several previous studies. Some of these studies can be seen in Table 7, Table 8, and Table 9. Although in some previous studies (Bacanin et al., 2020;Gulcu & Kus, 2020;Lee et al., 2018;Strumberger et al., 2019;Y. Sun et al., 2020) has different test conditions and in common with this study only the dataset used. Moreover, the differences in the computational machines used by researchers in this study with different previous studies make the comparison of this approach not entirely realistic.  Table 7, the test uses the MNIST dataset by comparing the error values between methods. From the table, it shows that the CWOA method with Chaotic Map 6 has the smallest error value. Then, the WOA method and the modified method from PSFHS have the second and third smallest values seen from the order of Table 7. Therefore, Table 7 can show that the method used by the researcher has the smallest error value when compared to several previous research methods which in his research did not implement regularization, namely EvoCNN, TGA-CNN, -CNN, and 3 EE-TGA-CNN. Furthermore, in Table 8 the test is carried out by comparing the accuracy values between methods. The highest accuracy value is obtained by the O method while the CWOA method with µ chaotic Map 6 has the largest accuracy value after the O method. The WOA method has the µ greatest accuracy value after the CWOA method with Chaotic Map 6. Then, the modified PSFHS method that the researchers did has a smaller accuracy result compared to previous studies that used the same method and dataset, namely the PSF-HS-CNN modification. Then, in Table 9, the test uses the FashionMNIST dataset by comparing the accuracy values between methods. The greatest accuracy value is obtained by the EvoCNN method and the O µ method. Both methods have a greater accuracy value when compared to the methods used in this study, it is the CWOA method with Chaotic Map 8, the WOA method, and the modified PSFHS method. More results from other methods can be seen in Appendix A.

Conclusion
Judging from the results of this study, the use of regularization in searching for hyperparameters of the CNN method also affects the accuracy and error results obtained during the training process. This is evidenced by the method used in the study which has a smaller error value compared to several previous studies. However, it is different for the accuracy results on the MNIST dataset where the method in this study is not superior to any of the previous studies. Meanwhile, in the FashionMNIST dataset, the accuracy value obtained by the method in this study also does not have the greatest accuracy value compared to previous studies.