Tide Prediction in Prigi Beach using Support Vector Regression (SVR) Method

. Purpose: Prigi Beach has the largest fishing port in East Java, but the topography of this beach is quite gentle, so it is prone to disasters such as tidal flooding. The tides of seawater strongly influence the occurrence of this natural event. Therefore, information on tidal level data is essential. This study aims to provide information about tidal predictions. Methods: In this case using the SVE method. Input data and time were examined using PACF autocorrelation plots to form input data patterns. The working principle of SVR is to find the best hyperplane in the form of a function that produces the slightest error. Result: The best SVR model built from the linear kernel, the MAPE value is 0.5510%, the epsilon is 0.0614, and the bias is 0.6015. The results of the tidal prediction on Prigi Beach in September 2020 showed that the highest tide occurred on September 19, 2020, at 10.00 PM, and the lowest tide occurred on September 3, 2020, at 04.00 AM. Value: After conducting experiments on three types of kernels on SVR, it is said that linear kernels can predict improvements better than polynomial and gaussian kernels.


INTRODUCTION
Indonesia is an archipelagic country whose almost 70% of its territory is the sea [1]. Therefore, to maximize marine resources, it is necessary to refer to information on oceanographic parameters. Oceanographic parameters include waves [2], tides [3], currents [4], and others that are influenced by weather in the sea area [5]. Tidal information itself is considered crucial because, generally, tides can affect the speed of ocean currents [6]. Tidal information is needed by natural disaster management agencies such as tsunamis and El Nino storms [7]. For coastal communities, tidal information also dramatically affects the survival of people who depend on the sea, one of which is the coastal community of Prigi [8].
Prigi Beach is one of the beaches located in Trenggalek Regency, East Java Province. Prigi Beach has the largest fishing port in Indonesia in East Java, but this beach has a reasonably sloping topography, so it is vulnerable to tidal flooding [9]. In 2016 the Prigi beach area experienced tidal flooding. This incident occurred because the tide reached a height of 2-3 meters above sea level (Source: Merdeka.com). Therefore, it is necessary to predict the waves so that this event does not occur in the future and make maximum use of marine resources on the coast of Prigi.
Making a tidal prediction system cannot be separated from time series analysis. One of the prediction methods, namely Support Vector Regression (SVR) [10]. The working principle of SVR is to find the best hyperplane in the form of a regression function by making the slightest possible error. In addition, SVR is a suitable method for data with non-linear characteristics [11].
A tidal prediction has previously been made using the Backpropagation method. The results obtained MSE 0.0035861 with 70% training data, five nodes on the hidden network, learning speed 0.9, and target error 0.01 [12]. Another study used the Autoregressive integrated moving average (ARIMA) method combined with a deep belief network to predict tides. MAPE results are 17.21%, so the prediction results are just at a good level [13].
Another study compared the SVR method with ANN to predict groundwater levels with limited and unlimited systems. This study explains that [14], the SVR method is superior to the ANN method for estimating groundwater levels for the next 1, 2, and 3 months. Another study using the SVR method to predict building energy in South China resulted in an MSE value of less than 0.01 using seven parameters [15]. Another study compared SVR with multiple linear regression in the effect of weather on rice productivity in Indonesia. This study revealed that [16], the SVR method is superior to RMSE 25,8503 and MAE 20,912. From previous studies using SVR, it is known that SVR is suitable for prediction. But by processing the data at the beginning is also increasing the accuracy. It can use PACF autocorrelation plot is to check the time-series input data [17]. So in this study, using PACF in data processing to improve the accuracy of SVR.
Based on this background, the purpose of this study is to provide information about tidal predictions, in this case using the SVR method with processing the initial data for the formation of time series using PACF to predict tides on the Prigi coast. This research is expected to be a reference in predicting the tidal.

Fillmissing
In the data collection process, blank data is often encountered which can be due to technical errors so that a calculation is needed to fill in the blank data by taking into account the time series data pattern. The interpolation method is suitable for filling blank data in time series [18]. The interpolation method can be formulated using Equation (1).
Where is the order of data that contains blank values, is the order of data before the data is blank, is the order of data after the data is blank, is the result of interpolated data, is the order data before the data is blank, is the order data after the data that is blank. The purpose of Equation (1) is that the value of the order data before the data is blank, the value of the order data after the data is blank, and the value of the blank data has an influence or relationship, this is called the time series pattern.

Partial Autocorrelation function (PACF)
The Partial Autocorrelation function (PACF) on lag-k is the correlation between and + after the linear dependencies between and + variables between +1 , +2 , … … , + −1 are removed. Partial autocorrelation is used to measure the level of closeness (association) between and + , if the effect of time lag is 1, 2, 3, . . . , − 1 is considered separate [19].

Support Vector Regression (SVR)
SVR is the development of the Support Vector Machine (SVM) method to solve regression cases [20]. The purpose of the SVR is to obtain the function ( ) as a dividing line (hyperplane) in the form of a regression function according to input data. The linear function of the SVR method can be formulated using Equation (2) [21].
Where ( ) is a regression function, is a weight vector that has dimension l , in other words w is a normal field, functions to balance errors with a hyperplane, ( ) is obtained from the mapping of low-dimensional input vectors to produce a point in a high dimensional feature space, is the bias which is the position of the plane relative to the coordinate center [22].
Determining the parameter values of and becomes a quadratic programming problem. To overcome optimization problems with constraints it is called lagrange. The optimal solution can be solved by the lagrange multiplier equation which is formulated using Equation (3) [23].
From the process of deriving the formula in Equation (3), the main variables are and * . The solution to this problem is derived from the vector then substituted into the function ( ).
The algorithm can be adjusted for non-linear regression problems by adding a kernel [24]. The observations in the SVR can be mapped to a higher dimension which has a linear structure, without regard to explicit mapping. The regression model for non-linear cases can be formulated in Equation (5) [25].
Where β is the difference between and * , ( , ) is a kernel trick that is often used in SVM and SVR methods.

Kernel
Linear and non-linear SVM problems can be solved by adding a formulated kernel function to Equation (6), (7) and (8). a. Kernel Linear .
b. Kernel Polynomial ( . + 1) c. Kernel Gaussian The Parameter > 0 is constant. ‖ − ‖ is the euclidean distance and σ is the value of the SVR parameter [26]. The difference between the three kernel functions is in the function of their mapping to feature space. Each kernel has advantages and disadvantages in each particular case, so it is necessary to find the best kernel function to use in some instances by trial and error [27].

MAPE (Mean Absolute Percent Error)
MAPE is obtained from the calculation of predictive data with actual data expressed in percent. The MAPE value can be formulated using Equation (9) [28].
Where is the result of subtracting from the actual data value to with the predicted data value to , is the predicted value of the data to , and is the amount of data. The value categories of MAPE can be seen in Table 1.

Technical Research
The Flowchart of Tide Prediction in Prigi Beach using SVR by Figure 1.

Figure 1. Flowchart of Tide Prediction System in Prigi Beach using SVR
The first step in predicting tides is to determine the input data. The data is then arranged into a time-series series using Equation (1), using the PACF autocorrelation plot, and then forming a series according to the time sequence. After that, the data is divided into training and testing processes with a ratio of 80%:20%. The parameters used in the training process are initialized. In the training process, the SVR model is obtained using Equation (5) which will later be used in the testing process. The last step is to evaluate by using a confusion matrix as in Equation (9).

RESULT AND DISCUSSION
The area used as the research location is Prigi Beach, located in Trenggalek Regency, East Java Province, as shown in Figure 2. The topography of the Prigi beach area is quite sloping, causing tidal flooding in the coastal area [9]. This study uses secondary data in tidal data from August 2 nd to September 1 st , in 2020, obtained from Meteorological, Climatological, and Geophysical Surabaya. This estimate uses hourly data with 720 data which means it is hourly data for 30 days. Table 2 shows an example of tidal data for the Prigi coast in rad (m). The first process in studying tidal predictions at Prigi beach is data analysis. In this study, there is a preprocessing stage, namely fill-missing. Table 2 shows blank data at 6 AM, and it is necessary to fill in the empty data using Equation (1). The result is presented in Table 3. This study has four predictor variables (x) and one response variable (y). The time-series data in Table 4 represents the formation of the PACF plot. These data are helpful for the prediction process with ( −5 ) as input data and ( ) as prediction targets. The PACF plot shows the 4 th data pattern of Lag so that every 4th time-series pattern is deleted. Comparison of training data and test data is 80%:20%. The second process is training or forecasting the measurement formulation model using Equation (5). The third process is model testing to determine the level of accuracy. The formulation of the SVR model uses several types of kernels, namely linear, polynomial and Gaussian kernels. Next, measure the error rate using Equation (9). The comparison of the error values for each kernel is presented in Table 5. Visualization of actual data and prediction results can be done with linear, polynomial, and Gaussian kernels as seen in Figure 3, 4 and 5. In Table 5, the prediction results by comparing the three kernels, obtained of ε 0.0614, is the distance between the hyper tube and the hyperplane. Prediction results with three kernel trials show that the linear kernel produces the best accuracy. The resulting MAPE value is 0.5510%, the value of [0.446;-0.335;-0.647;1.441] and the bias is 0.6015. Because MAPE is less than 10%, it is included in the very good category. Saputra, G. H., et al., in his research also revealed that the linear kernel has a fairly high prediction accuracy and produces relatively small errors in the prediction process [29].
The value of is the result of the reduction of and * . A non-linear kernel, namely the Gaussian kernel and polynomial, cannot display because of the difficulty of the calculation process. Smola stated in his research that "Even when evaluating ( ) we need not compute explicitly" [32]. The error in this study is very small because the data in this study are pure tidal data without the influence of other components.
This study uses training data from August 2 at 7 PM to August 26, 2020, at 2 PM. The acquisition of test results starts from August 26 th at 8 PM until September 1, 2020, at 6 PM, with one hour output to come. Based on the acquisition of the best model, the predicted tide results in Prigi Beach in September 2020 showed the highest tide occurred on September 19, 2020, at 10 PM, and the lowest tide occurred on September 3, 2020, at 4 AM. Because the outcome of this study only shows one outcome, even though the situation in society requires more tidal predictions for at least the next 3 hours. Therefore, the authors expect further research to use a method that can display more than one output, for example, Multiple-Output Support Vector Regression (M-SVR) [30].

CONCLUSION
Based on the results of tidal research on the Prigi coast, it can be concluded that using the SVR method for tidal prediction with four previous tidal data parameters and a comparison of 3 kernels, namely Gaussian, polynomial and linear kernels, produces a very small MAPE. However, the smallest MAPE obtained by using a linear kernel is 0.5510%, so it is included in the very good category. Based on the acquisition of the best model,the predicted tide results in Prigi Pantai in September 2020 showed the highest tide occurred on September 19, 2020, at 10 PM, and the lowest tide occurred on September 3, 2020, at 4 AM.