Swarm Optimized Grey SVR and ARIMA for Modeling of Larceny-Theft Rate with Economic Indicators

Author(s):  
Razana Alwee ◽  
Siti Mariyam Shamsuddin ◽  
Roselina Sallehuddin

As real world data, larceny-theft rates are most likely to have both linear and nonlinear components. A single model such as the linear or nonlinear model may not be sufficient to model the larceny-theft rate. Thus, a hybridization of the linear and nonlinear models is proposed for modeling the larceny-theft rate. The proposed model combines Support Vector Regression (SVR) and Autoregressive Integrated Moving Average (ARIMA) models. Particle swarm optimization is used to optimize the parameters of SVR and ARIMA models. The proposed model is equipped with features selection that combines grey relational analysis and SVR to choose the significant economic indicators for the larceny-theft rate. The experimental results show that the proposed model has better accuracy than the linear, nonlinear, and existing hybrid models in modeling the larceny-theft rate of United States.

Author(s):  
Osman Yakubu ◽  
Narendra Babu C.

Forecasting electricity consumption is vital, it guides policy makers and electricity distribution companies in formulating policies to manage production and curb pilfering. Accurately forecasting electricity consumption is a challenging task. Relying on a single model to forecast electricity consumption data which comprises both linear and nonlinear components produces inaccurate results. In this paper, a hybrid model using autoregressive integrated moving average (ARIMA) and deep long short-term memory (DLSTM) model based on discrete fourier transform (DFT) decomposition is presented. Aided by its superior decomposition capability, filtering using DFT can efficiently decompose the data into linear and nonlinear components. ARIMA is employed to model the linear component, while DLSTM is applied on the nonlinear component; the two predictions are then combined to obtain the final predicted consumption. The proposed techniques are applied on the household electricity consumption data of France to obtain forecasts for one day, one week and ten days ahead consumption. The results reveal that the proposed model outperforms other benchmark models considered in this investigation as it attained lower error values. The proposed model could accurately decompose time series data without exhibiting a performance degradation, thereby enhancing prediction accuracy.


2021 ◽  
Author(s):  
Mohammad Mamouei ◽  
Karthik Budidha ◽  
Nystha Baishya ◽  
Meha Qassem ◽  
Panayiotis Kyriacou

Abstract The linear relationship between optical absorbance and the concentration of analytes -as postulated by the Beer-Lambert law- is one of the fundamental assumptions that much of the optical spectroscopy literature is explicitly or implicitly based upon. The common use of linear regression models such as principal component regression and partial least squares exemplifies how the linearity assumption is upheld in practical applications. However, the literature also establishes that deviations from the Beer-Lambert law can be expected when a) the light source is far from monochromatic, b) the concentrations of analytes are very high and c) the medium is highly scattering. The lack of a quantitative understanding of when such nonlinearities can become predominant, along with the mainstream use of nonlinear machine learning models in different fields, have given rise to the use of methods such as random forests, support vector regression, and neural networks in spectroscopic applications. This raises the question that, given the small number of samples and the high number of variables in many spectroscopic datasets, are nonlinear effects significant enough to justify the additional model complexity? In the present study, we empirically investigate this question in relation to lactate, an important biomarker. Particularly, to analyze the effects of scattering matrices, three datasets were generated by varying the concentration of lactate in phosphate buffer solution, human serum, and sheep blood. Additionally, the fourth dataset pertained to invivo, transcutaneous spectra obtained from healthy volunteers in an exercise study. Linear and nonlinear models were fitted to each dataset and measures of model performance were compared to attest the assumption of linearity. To isolate the effects of high concentrations, the phosphate buffer solution dataset was augmented with six samples with very high concentrations of lactate between (100-600 mmol/L). Subsequently, three partly overlapping datasets were extracted with lactate concentrations varying between 0-11 mmol/L, 0-20 mmol/L and 0-600 mmol/L. Similarly, the performance of linear and nonlinear models were compared in each dataset. This analysis did not provide any evidence of substantial nonlinearities due high concentrations. However, the results suggest that nonlinearities in scattering media may be substantial, justifying the use of complex, nonlinear models.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254137
Author(s):  
Muhammad Adam Norrulashikin ◽  
Fadhilah Yusof ◽  
Nur Hanani Mohd Hanafiah ◽  
Siti Mariam Norrulashikin

The increasing trend in the number new cases of influenza every year as reported by WHO is concerning, especially in Malaysia. To date, there is no local research under healthcare sector that implements the time series forecasting methods to predict future disease outbreak in Malaysia, specifically influenza. Addressing the problem could increase awareness of the disease and could help healthcare workers to be more prepared in preventing the widespread of the disease. This paper intends to perform a hybrid ARIMA-SVR approach in forecasting monthly influenza cases in Malaysia. Autoregressive Integrated Moving Average (ARIMA) model (using Box-Jenkins method) and Support Vector Regression (SVR) model were used to capture the linear and nonlinear components in the monthly influenza cases, respectively. It was forecasted that the performance of the hybrid model would improve. The data from World Health Organization (WHO) websites consisting of weekly Influenza Serology A cases in Malaysia from the year 2006 until 2019 have been used for this study. The data were recategorized into monthly data. The findings of the study showed that the monthly influenza cases could be efficiently forecasted using three comparator models as all models outperformed the benchmark model (Naïve model). However, SVR with linear kernel produced the lowest values of RMSE and MAE for the test dataset suggesting the best performance out of the other comparators. This suggested that SVR has the potential to produce more consistent results in forecasting future values when compared with ARIMA and the ARIMA-SVR hybrid model.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Razana Alwee ◽  
Siti Mariyam Hj Shamsuddin ◽  
Roselina Sallehuddin

Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.


2015 ◽  
Vol 7 (11) ◽  
pp. 190
Author(s):  
Mamdouh A. M. Abdelsalam ◽  
Doaa Akl Ahmed

This paper aims at improving the prediction accuracy through using combining forecasts approaches. In forecast combination, the crucial issue is the selection of the weights to be assigned to each model. In addition to traditional methods, we propose, also, two sophisticated approaches. These suggested methods are modified Bayesian Moving Average (BMA) and Extended Time-varying coefficient (ETVC). The first technique is based on merging the traditional BMA with other frequentist combination schemes to avoid the subjective prior inside the traditional Bayesian technique. The suggested ETVC approach provides consistent time-varying parameters even if there are some measurement errors, omitted variables bias and if the true functional form is unknown. Concerning the included models, we consider both linear and nonlinear models in order to calculate the forecasts of quarterly Egyptian CPI inflation. We find that our proposed scheme ETVC is superior to the best model and all other static combination schemes including the time-varying scheme based on the random walk coefficients updated (TVR) approach. Additionally, the suggested modified Bayesian approach improves the traditional BMA and overcomes the problem of depending on the arbitrary choice for the initial priors.


Author(s):  
Prajoy Podder ◽  
Aditya Khamparia ◽  
M. Rubaiyat Hossain Mondal ◽  
Mohammad Atikur Rahman ◽  
Subrato Bharati

Since December 2019, the world is fighting against coronavirus disease (COVID-19). This disease is caused by a novel coronavirus termed as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This work focuses on the applications of machine learning algorithms in the context of COVID-19. Firstly, regression analysis is performed to model the number of confirmed cases and death cases. Our experiments show that autoregressive integrated moving average (ARIMA) can reliably model the increase in the number of confirmed cases and can predict future cases. Secondly, a number of classifiers are used to predict whether a COVID-19 patient needs to be admitted to an intensive care unit (ICU) or semi-ICU. For this, classification algorithms are applied to a dataset having 5644 samples. Using this dataset, the most significant attributes are selected using features selection by ExtraTrees classifier, and Proteina C reativa (mg/dL) is found to be the highest-ranked feature. In our experiments, random forest, logistic regression, support vector machine, XGBoost, stacking and voting classifiers are applied to the top 10 selected attributes of the dataset. Results show that random forest and hard voting classifiers achieve the highest classification accuracy values near 98%, and the highest recall value of 98% in predicting the need for admission into ICU/semi ICU units.


Author(s):  
Prajoy Podder ◽  
Aditya Khamparia ◽  
M. Rubaiyat Hossain Mondal ◽  
Mohammad Atikur Rahman ◽  
Subrato Bharati

<span lang="EN-US">Since December 2019, the world is fighting against coronavirus disease (COVID-19). This disease is caused by a novel coronavirus termed as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This work focuses on the applications of machine learning algorithms in the context of COVID-19. Firstly, regression analysis is performed to model the number of confirmed cases and death cases. Our experiments show that autoregressive integrated moving average (ARIMA) can reliably model the increase in the number of confirmed cases and can predict future cases. Secondly, a number of classifiers are used to predict whether a COVID-19 patient needs to be admitted to an intensive care unit (ICU) or semi-ICU. For this, classification algorithms are applied to a dataset having 5644 samples. Using this dataset, the most significant attributes are selected using features selection by ExtraTrees classifier, and Proteina C reativa (mg/dL) is found to be the highest-ranked feature. In our experiments, random forest, logistic regression, support vector machine, XGBoost, stacking and voting classifiers are applied to the top 10 selected attributes of the dataset. Results show that random forest and hard voting classifiers achieve the highest classification accuracy values near 98%, and the highest recall value of 98% in predicting the need for admission into ICU/semi ICU units.</span>


Author(s):  
Razana Alwee ◽  
Siti Mariyam Hj Shamsuddin ◽  
Roselina Sallehuddin

Features selection is very important in the multivariate models because the accuracy of forecasting results produced by the model are highly dependent on these selected features. The purpose of this study is to propose grey relational analysis and support vector regression for features selection. The features are economic indicators that are used to forecast property crime rate. Grey relational analysis selects the best data series to represent each economic indicator and rank the economic indicators according to its importance to the property crime rate. Next, the support vector regression is used to select the significant economic indicators where particle swarm optimization estimates the parameters of support vector regression. In this study, we use unemployment rate, consumer price index, gross domestic product and consumer sentiment index as the economic indicators, as well as property crime rate for the United States. From our experiments, we found that the gross domestic product, unemployment rate and consumer price index are the most influential economic indicators. The proposed method is also found to produce better forecasting accuracy as compared to multiple linear regressions.


Author(s):  
Olumide Sunday Adesina ◽  
Samson Adeniyi Onanaye ◽  
Dorcas Okewole ◽  
Amanze C. Egere

The emergence of global pandemic known as COVID-19 has impacted significantly on human lives and measures have been taken by government all over the world to minimize the rate of spread of the virus, one of which is by enforcing lockdown. In this study, Autoregressive fractionally integrated moving average (ARFIMA) Models was used to model and forecast what the daily new cases of COVID-19 would have been ten days after the lockdown was eased in Nigeria and compare to the actual new cases for the period when the lockdown was eased.  The proposed model ARFIMA model was compared with ARIMA (1, 0, 0), and ARIMA (1, 0, 1) and found to outperform the classical ARIMA models based on AIC and BIC values. The results show that the rate of spread of COVID-19 would have been significantly less if the strict lockdown had continued. ARFIMA model was further used to model what new cases of COVID-19 would be ten days ahead starting from 31st of August 2020. Therefore, this study recommends that government should further enforce measures to reduce the spread of the virus if business must continue as usual.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Salwa Waeto ◽  
Khanchit Chuarkham ◽  
Arthit Intarasit

Forecasting the tendencies of time series is a challenging task which gives better understanding. The purpose of this paper is to present the hybrid model of support vector regression associated with Autoregressive Integrated Moving Average which is formulated by hybrid methodology. The proposed model is more convenient for practical usage. The tendencies modeling of time series for Thailand’s south insurgency is of interest in this research article. The empirical results using the time series of monthly number of deaths, injuries, and incidents for Thailand’s south insurgency indicate that the proposed hybrid model is an effective way to construct an estimated hybrid model which is better than the classical time series model or support vector regression. The best forecast accuracy is performed by using mean square error.


Sign in / Sign up

Export Citation Format

Share Document