Genetic Algorithm for Relational Database Optimization in Reducing Query Execution Time

The relational database is defined as the database by connecting between tables. Each table has a collection of information. The information is processed in the database by using queries, such as data retrieval, data storage, and data conversion. If the information in the table or data has a large size, then the query process to process the database becomes slow. In this paper, Genetic Algorithm is used to process queries in order to optimize and reduce query execution time. The results obtained are query execution with genetic algorithm optimization to show the best execution time. The genetic algorithm processes the query by changing the structure of the relation and rearranging it. The fitness value generated from the genetic algorithm becomes the best solution. The fitness used is the highest fitness of each experiment results. In this experiment, the database used is MySQL sample database which is named as employees. The database has a total of over 3,000,000 rows in 6 tables. Queries are designed by using 5 relations in the form of a left deep tree. The execution time of the query is 8.14247 seconds and the execution time after the optimization of the genetic algorithm is 6.08535 seconds with the fitness value of 0.90509. The time generated after optimization of the genetic algorithm is reduced by 25.3%. It shows that genetic algorithm can reduce query execution time by optimizing query in the part of relation. Therefore, query optimization with genetic algorithm can be an alternative solution and can be used to maximize query performance.


INTRODUCTION
The relational database is a complex system [1].Relational model is a model that is often used in processing the database, because there are interrelated information tables.In the relational model, data is formed in relation [2].Relational database information will be stored into collections in the table [3].Each table has data storage and will be reused.The processing of relational database uses SQL language, or often known as query.The SQL process is instructed by using Database Management System (DBMS) [3][4][5][6].
Structure Query Language (SQL) can be represented in query string [4].SQL or queries are made from operations performed on table [3].Query performance can slow down when executing multiple tables of information with large data in physical storage.If that happens, the query requires optimization with query optimizer or can also use optimization algorithm [7].Troubleshooting needs algorithm.Algorithm is not only used to solve easy problems, but it can also be used on complex issues.Algorithm for optimizations such as genetic algorithm is an easy method for optimization problems [2,5,8].There is a randomly solved method in the algorithm [9,10,11] and belongs to the soft computing model [12].The genetic algorithm was introduced in the 1960s, invented by John Holland and developed by David Goldberg [13,14].The genetic algorithm consists of individuals in the population [15].The concept is to perform a search technique to derive a solution based on the evolutionary process [16][17][18][19].In the process of evolution, individuals who can survive will be obtained, so that the individual who has experienced gene changes for many times, will be able to to adapt.Individual changes occur through breeding.The process of genetic algorithm and the like such as selection, crossover, and mutations to produce the best individuals [20].
Genetic algorithm is used to process database queries in order to obtain optimal queries and best execution time.The process of genetic algorithm is by rearranging the query on the part of the relation.Each relation is calculated for execution time to be processed with genetic algorithm and to create fitness value.Relations are decreased based on the execution time per relation after the genetic algorithm is complete.The database used is a sample database with name of employees and the table used is 6 table 5 relation.

METHOD 2.1. Experiment
The purpose of this experiment is to generate the best execution time by optimizing the query by using a genetic algorithm.The experiment is useful for experimenting with various possible genetic algorithm parameters such as crossover probabilities and mutation probabilities.This experiment uses 100 experiments to produce the best execution time.

System Development
Systems for genetic algorithms are created from the PHP programming language.The system is used as a genetic algorithm tool to process queries, the result is a new query by changing the structure of the relation.Flowchart of the system for genetic algorithm can be seen Figure 1.The execution time of the query above was 8.14347 seconds.

Genetic Algorithm for Database Query
Steps to optimize relational database query with genetic algorithm can be seen in the description below: 1) Population Formation The initial population produces an initial solution [21].The population is formed by arranging chromosomes in the individual, the individual is the problem solution of the genetic algorithm [10], each chromosome is represented as follows: 2) Fitness Evaluation Fitness shows the advantages of the individual [22].In fitness evaluation, the fitness value of each individual will be calculated with equation 1.

∑ (1)
Where as; W: the execution time each chromosome 3) Crossover The crossover process uses single point crossover or one cut point.The purpose of the crossover is to add variation in the population [20].The process with chromosomes in individuals is exchanged with other individuals to create new individuals [14].Crossover-Individuals are selected randomly based on the probability of crossover.Chromosome positions are also randomly selected to be exchanged.Single point crossover process can be seen in Figure 2.

4) Mutation
Mutation is a modification of chromosomes in individuals [23].Individuals are selected on the basis of mutation probabilities and the chromosomes selected in the mutation are also replaced by the pair of chromosomes.The mutation process can be seen in Figure 3.

5) Selection
After crossover, the next mutation process is selection.Selection is almost the same as natural selection with the survival of the fittest principle [16,17].The selection used is roulette wheel selection [24].In the roulette wheel, each individual is selected based on the number that appears in the roulette wheel.Selected individuals are included in the population.

6) Elitism
Elitism is used to store the best individuals with the highest fitness values and possibly will be reused in the next generation.

Experimental Result
Parameters for experiments with genetic algorithm are as follows: 1. Generation: 100 2. Population size: 120 3. The probability of crossover (PC) and mutation probability (MP) used 10 to 100 with the increment of 10.
The experiment results in this article showed

Comparison of Query and Execution Time
From the experiment results, the best results with the optimization of genetic algorithm and without genetic algorithm can be compared, it can be seen in Table 2.
Comparison of queries can be seen in Table 3.Where as; GA : optimization of genetic algorithm No GA : no genetic algorithm In Table 2, the time decreased 25.3% of the query execution time without the genetic algorithm.Table 3 of the FROM query showed the difference in the order, the structure of the order is reshaped in descending manner, based on the size of the tables in the database.WHERE query also showed the difference that the structure was arranged in descending manner, based on the execution time per relation, but the position of the table field for the relation was processed by genetic algorithm.

System Implementation
System of genetic algorithm was used as tools, user can fill some parameter of genetic algorithm and query to be processed to be optimized, it can be seen in Figure 4.After filling the parameters of the genetic algorithm, the system displayed the results of the genetic algorithm process in the form of query and execution time.The results shown were the results of per experiment in the system, if many experiments were performed then the user must do the same thing, but if it used the same query, the user sould only fill the parameters of the genetic algorithm.The results of the system can be seen in Figure 5.

Figure 1 .
Figure 1.The Flowchart of Genetic Algorithm System 3. RESULTS AND DISCUSSION 3.1.Query and Database for Experiment In this experiment, the database used is MySQL sample database with the name of employees.The database can be accessed at https://github.com/datacharmer/test_db,but the data contained in the database is selected first to get the near-to-real data.The query used is in the form of left deep tree [3], as follows: SELECT a.first_name, a.last_name, b.salary, c.title, d.dept_no, e.dept_name, f.emp_no FROM employees a, salaries b, titles c, dept_emp d, departments e, dept_manager f WHERE a.emp_no = b.emp_noAND b.emp_no = c.emp_noAND c.emp_no = d.emp_noAND d.dept_no = e.dept_noAND e.dept_no = f.dept_noAND a.first_name = "Ramzi" AND c.title = "Senior Engineer" AND b.salary BETWEEN 55025 AND 59700

Figure 4 .
Figure 4. Input of Genetic Algorithm Parameters

Figure 5 .
Figure 5. Experiment Result of The System

Table 1 ,
the best execution time was 6.08535 seconds with the fitness value of 0.90509.The result was the best experiment of 100 experiments.Query results with genetic algorithm optimization can be seen in below: SELECT a.first_name, a.last_name, b.salary, c.title, d.dept_no, e.dept_name, f.emp_no FROM salaries b, employees a, titles c, dept_emp d, dept_manager f, departments e