An Approach to Measure the Death Impact of Covid-19 in Jakarta using Autoregressive Integrated Moving Average (ARIMA)

Coronavirus disease 2019 (COVID-19) is a pandemic in more than 200 countries around the world. As the fourth most populous nation in the world, Indonesia is predicted to face a big threat to this pandemic particularly Jakarta as the epicenter of the virus in Indonesia. However, the nature of COVID-19 that can easily spread and also many undetected cases that do not present symptoms make it more difficult to determine the real mortality effects of COVID-19.The deaths in Jakarta from the new coronavirus may be higher than officially reported. To overcome this issue, this paper will provide an approach to measure the death impact of COVID-19 using the Autoregressive Integrated Moving Average model (ARIMA). The model will predict the ‘what if ’ normal condition of the number of funerals in Jakarta compared to the real situation in March 2020 as an approach of the actual effect of COVID-19 in Jakarta. This research revealed a discrepancy of 450-1070 funerals in March 2020 that could not be predicted by the ARIMA model. This funeral gap, a forecast error, could be an approach to the potential number of possible death impacts of COVID-19 in Jakarta that should be significantly higher than the report. The people should be more conscious and alert of COVID-19 situation. ©2020 Universitas Negeri Semarang pISSN 2252-6781 eISSN 2548-7604 Article Info Article History: Submitted April 2020 Accepted June 2020 Published July 2020


INTRODUCTION
Coronavirus disease 2019 (COVID-19) is a pandemic in more than 200 countries around the world. This infectious respiratory disease was first identified in Wuhan, the People's Republic of China (Setiati & Azwar, 2020). As of 31 March 2020, there were 754,933 confirmed cases and 36,522 death cases worldwide (WHO, 2020).
As the fourth most populous nation in the world, Indonesia is predicted to face a big threat to this pandemic. However, Indonesia did not announce any cases of infection until February 2020. Only on 2 March, President Joko Widodo reported the first two confirmed cases of CO-VID-19 . The transmission of the virus was originated from a visit of Japanese citizens who were living in Malaysia to Indonesia (Tosepu, et al., 2020). As of April 1, the nation has reached 1677 confirmed cases, with 157 deaths and 103 recoveries (Indonesian Health Ministry, 2020).
As the capital city of Indonesia, Jakarta is heavily infected by and considered to be the epicenter of the virus in Indonesia, recording 808 cases and 85 deaths until April 1, more than any other province (Indonesian Health Ministry, 2020). However, epidemiologists say a relatively low level of testing means the number of cases appears et al., 2018). There are a lot of forecasting in epidemiology cases using Arima Model. Arima has been applied to predict influenza a virus frequency in swine in Ontario, Canada (Pethukova, et al., 2018) and in China (He & Tao, 2018). It also has been implemented to get the best model with seasonal ARIMA and to analyze the result of Dengue Fever cases (Pamungkas & Wibowo, 2019;Rubaya, et al., 2018). In addition, The ARIMA model has been an effective way to forecast the incidence cases of HFMD in China (Yu, et al., 2014).
The monthly funerals data per day from 2010 to 2018 were adopted to create the ARIMA model and then to forecast funerals from 2019 to 2020 to assess its stationarity and availability. Given a stationary time series of data Y'=(Y 1 ,Y 2 , …, Y n ), an autoregressive moving average (ARMA) model, denoted by ARMA (p,q), consists of two parts, an autoregressive (AR) part of order p and a moving average (MA) part of order q. Thus, the ARMA model of order p and q, denoted by ARMA (p,q) is given by Where μ is a constant, φ^'=(φ_1,φ_2,… ,φ_p) is a vector of autoregressive coefficients, θ'=(θ_1,θ_2,…,θ_p) is a vector of moving average coefficients, and ε_t are error terms assumed to be independent, identically-distributed random variables sampled from a distribution with mean equal to zero and variance σ_ε^2. In time series analyses, the variables ε_t are commonly referred to white noise, and they interpreted as an exogenous effect that the model is not able to explain (Martinez & Silva, 2011).
The time series in the ARIMA model should be a stationary and stochastic sequence with zero mean Wang, 2019). As a result, the unsmooth sequence should be transformed into a stationary series by difference transformation so that the ARMA model becomes the ARIMA model. Specifically, if "d" indicates the difference order, the model is written as ARIMA(p, d, q) without seasonal component, ARIMA (sp, sd, sq) with seasonal components, and ARIMA(p, d, q)(sp, sd, sq) complex model Wang, 2019).
The complex model, which is suitable for a general sequence, is the most advantageous among these models. Therefore, it is necessary to order p, d, q, sp, sd, and sq to create the ARIMA to have been vastly underreported. It has probably happened in Jakarta. The deaths in Jakarta from the new coronavirus may be higher than officially reported (Jefriando & Munthe, 2020).
The nature of COVID-19 that can easily spread and also many undetected cases that do not present symptoms also make it more difficult to determine the real mortality effects of CO-VID-19. Probably, the number of officially confirmed COVID-19 cases is inaccurate, as many of the people affected by the virus might have died before they even reached the hospital. However, Measuring vulnerability and Covid-19's effects on death rates is important for the development and implementation of pandemic management strategies like appropriate preparedness and effective responses .
To overcome this issue, this paper will provide an approach to measure the death impact of COVID-19 using the Autoregressive Integrated Moving Average model (ARIMA). This study can be viewed as an alternative paper to show the real mortality effects of COVID-19 in March in Jakarta, as the other research paper uses officially real-time data shared to measure it. There are two objectives of this paper. First, to find the best model that can be fit the total number of death in Jakarta from 2010 -2018. Second, to measure the death impact of COVID-19 from 2019 to March 2020 in Jakarta.

Data Sources
The number of funerals was viewed as an approximation of the death toll in Jakarta. It was compiled from the records of the Jakarta Department of Parks and Cemeteries from 2010 to 2020. It could be found on the link https:// pertamananpemakaman.jakarta.go.id/.
The monthly funeral data, which were the base data for the time-series model, were then modified by the number of days each month to determine the average number of funerals per day.

Data Set
The Data was split into a training set used to build models and a test set used to evaluate the predictive validity of the models. In this study, the data from 2010 to 2018 would be a training set and the data from 2019 to 2020 would a test set.

ARIMA Model
ARIMA models are in theory the best models for forecasting a time series. The procedure involves fitting an appropriate model, estimating the parameters and verifying the model (Anokye, model. Generally, the successive steps have been taken to create the ARIMA model including stationarity, identification, and estimation, as well as diagnosis and forecasting Wang, 2019).

(1). Sequence Stationarity
The time series (monthly funerals data per day from 2010 to 2018) was found to be nonstationary. As a result, the standard differential method was successively used to transform this unsmooth series into a stationary one Wang, 2019).
Subsequently, the basic sequence diagram and the transformed sequence diagram were used to determine stationarity and pattern. The sequence stationarity was then tested using Kwiatkowski-Phillips-Schmidt-Shin (KPSS) unit root test in R software .
(2). Identification Firstly, the randomness, stationarity, and seasonal features of the time series were recognized and analyzed through the evaluation of the autocorrelation function (ACF) and partial autocorrelation function (PACF). Subsequently, the orders of the model were generally determined from 0 to 2 by the AIC and BIC, whose orders were rarely more than 2. Finally, several rough models had been recognized by differently combining 0, 1, and 2; meanwhile, the optimal model with minimum AIC and BIC was eventually selected Wang, 2019).

(3). Estimation and Diagnosis
The appropriateness of the candidate model was diagnosed using the error series test (et), where 'et' was the residual error representing the difference value between the actual and predicted funeral. It was expected to be a white noise for an appropriate model. The white noise was recognized using the Box-Ljung test. In other words, the residual error must be random with no statistical significance in the residual correlation test. According to the residual irrelevant principle, the model was suitable for forecasting if its residual series was white noise; otherwise, the model should be improved and re-identified Wang, 2019).

(4). Forecasting and Assessment
The optimum ARIMA was applied to test monthly funerals data forecasting from 2019 to 2020. The fitting effect of the ARIMA model between the real and the predicted values was determined by evaluating that the real values had fallen within the 95% confidence interval of the predicted values. The difference between the pre- diction and the test set data called forecast error . It might be assumed as an unpredictable situation that happened in the period. Forecast Error is distinct from the residual error that measures the difference between the prediction and the training data used in building the model. It is different from residual error.

Statistical Analysis
R software was used to analyze the time series, define the time variable, and estimate the stationarity. Moreover, the series and correlation were plotted, and the Box-Ljung test was conducted. ARIMA model fitting tests were also carried out, including standard error, log-likelihood, AIC, and BIC, and residual error using R software.

RESULTS AND DISCUSSION
Descriptive Analysis Funerals per day in Jakarta from 2010 to 2019 (1). Funerals in Jakarta from 2010 to 2019 As is shown in Figure 1, a total of 318,770 funerals were observed from January 2010 to March 2020. The funerals showed a wave-like rising tendency year by year. If the outlier situation in March was ignored, the median of funeral would be 2,584 people, with the minimum was 1,973 people in December 2011 and the maximum was 3,404 people in March 2016. In March 2020 the number of funerals has reached 4,377 people, the highest point in ten years.
As is shown in Figure 2, the funerals per day also showed a wave-like rising tendency from January 2010 to March 2020. If the outlier situation in March was ignored, the median of funeral per day would be 85,6 people, with the minimum was 64 people in December 2011 and the maximum was 110 people in March 2016. This was strongly indicated that something terrible has occurred and affected in March 2020 when the number of funerals per day has reached 141 people, the highest point in ten years. Figure 2 as a calendar adjustment of funerals was smoother than Figure 1. Therefore, this data should be used for the ARIMA model.

(2). Time Distribution
An alternative plot that emphasizes the seasonal patterns is where the data for each season are collected together in separate mini time plots. The horizontal lines indicate the means for each month. Based on the Figure 3, the mean of funerals per day in March is the highest, followed by February and April. This means that usually (2). Identification The significant spike at lag 1 in the ACF suggests a non-seasonal MA(1) component, and the significant spike at lag 12 and 24 in the ACF suggests a seasonal MA(2) component. Consequently, we begin with an ARIMA (0,1,1) (0,0,2)12 model, indicating a first difference, and non-seasonal and seasonal MA(1) components. By analogous logic applied to the PACF, we could also have started with an ARIMA (1,1,0) (2,0,0)12 model.

ARIMA Model Forecasting Analysis (1). Sequence Characteristic Analysis and Transformation
Firstly, a monthly series from 2010 to 2018 was compiled and drawn, as seen in Figure 4. The original series showed an upward or downward pattern with a seasonal cycle rhythm, which was not smooth and had uneven variances. The slow decrease in the ACF as the lags increase is due to the trend, while the "scalloped" shape in lag 12 and 24 is due to the seasonality . Therefore, the original sequence was transformed into a random one through the methods of difference successively . Finally, the time series displayed a random and stationary trend ( Figure 5).
The sequence stationarity was then tested using Kwiatkowski-Phillips-Schmidt-Shin (KPSS) unit root test in R software, as follows.
The value of the test statistic is 0.0249 or lower than 0.05. This means that the data is stationary and could be analyzed using ARIMA (d=1) model. Then, the data was also tested for normality and randomness using the Shapiro Test and the Ljung-Box Test as follows. > shapiro.test(diff(ypd)) virus have no symptoms. When the virus does cause symptoms, common ones include fever, dry cough, fatigue, loss of appetite, loss of smell, and body ache. In some people, COVID-19 causes more severe symptoms like high fever, severe cough, and shortness of breath, which often indicates pneumonia (Harvard, 2020). A lot of people are dying even before the infection could be detected.
Secondly, the number of people who could not access health care due to this condition for some reason. After all, the supply of doctors, nurses, and paramedical staff is not limitless. The amount of care and time they can give is limited. And if they fall prey to the virus, they will be isolated as well, which means their services are lost. The hospital beds can be summed up to a finite number, as can the number of intensive care unit beds and paraphernalia available in any country. Finally, the emergence of coronavirus doesn't mean that other illnesses like heart attacks, stroke, and cancers become any less prevalent (Thomas, 2020). The people may have other illnesses but when they get sick, this will be a vital moment the model. Ironically, no model could predict the interval reached 141.19 in March 2020 as shown more clearly in the table below: If the impact of COVID-19 is previously believed to be unpredictable, the difference between the real and predicted (upper limit) of each model in March may prove to be the forecast error assumed as the effect of COVID-19. The detail of this will be found in Table 2.
It can be seen from Table 2 (column 3) that there was a difference or forecast error of the model between 450 -1070 people in March. This may be believed to be the number of deaths caused by COVID-19. Not only, as a positive CO-VID-19 individual (column 4) but also as another previously undetectable reason. Table 2 shows that the definition of death impact of COVID-19 is not only the positive infected cases but also from undetected death. Several reasons probably made the number of deaths that happened during the month significantly higher than the death rate reported. Firstly, the nature of the coronavirus that is difficult to be identified. Some people infected with the We all hope that this pandemic will be overcome as quickly as possible. But no one knows when the pandemic is coming to an end. Data may be the key to the solution of this problem. Without reliable data, it's really hard for us to learn and predict how the corona pandemic in Indonesia will be. This analysis tries to fulfill this gap information. It suggests another potential number of possible death impacts of COVID-19 that may be significantly higher than the report.
The data aim to provide a guide to the public. The public will use it as a guide to risk assessment, decision-making, and their lives management. Data is really important in a pandemic like this. Based on the output of the model (Table  2, column 5, and 7), the undetectable deaths were 4 to 11 times compared to those caused by a positive COVID-19 report on April 1, 2020. People need to be informed that the effect of COVID-19 must be higher than recorded so that they can be more conscious and alert of COVID-19.
The Indonesian government has implemented many public recommendations to deal with COVID-19. The participations from the community are a vital role to overcome this pandemic condition. The community should implement the social distancing and self-isolation protocol. The personal hygiene behavior, including hand-wash, should be implemented regularly, as soon as we have touched anything . Moreover, it is essential to educate the public to recognize unusual symptoms such as chronic cough or shortness of breath so they could seek  . After all, the people will sincerely believe that obeying the law and following the government's rule are crucial. Social distancing, washing hands regularly, and avoiding meeting are several activities that could help us and support each other. Together, we could slow the spread of the virus means the case doesn't become too heavy to be dealt with by the limited health resources available to handle it. Moreover, we could reduce unnecessary deaths as well as the impact of COVID-19.

CONCLUSION
Coronavirus Disease 2019 (COVID-19) is a pandemic that also has an impact on the death toll in Jakarta. There was a difference between the actual and predicted of the model or forecast error between 450 -1070 people in March in Jakarta. This may be another potential number of possible death impacts of COVID-19 that should be significantly higher than the report. The people should be more conscious and alert of CO-VID-19 situation. Obeying the law and following the government's rule are crucial to reduce the spread of Covid-19 as well as its death impact.