scholarly journals Automation of Best-Fit Model Selection using a Bag of Machine Learning Libraries for Sales Forecasting

2021 ◽  
Vol 12 (06) ◽  
pp. 17-26
Author(s):  
Pauline Sherly Jeba P ◽  
Manju Kiran ◽  
Amit Kumar Sharma ◽  
Divakar Venkatesh

Sales forecasting became crucial for industries in past decades with rapid globalization, widespread adoption of information technology towards e-business, understanding market fluctuations, meeting business plans, and avoiding loss of sales. This research precisely predicts the automotive industry sales using a bag of multiple machine learning and time series algorithms coupled with historical sales and auxiliary features. Three-year historical sales data (from 2017 till 2020) were used for the model building or training, and one-year (2020-2021) predictions were computed for 900 unique SKU's (stock-keeping units). In the present study, the SKU is a combination of sales office, core business field, and material customer group. Various data cleaning and exploratory data analysis algorithms were implemented over raw datasets before use for modeling. Mean absolute percentage error (mape) were estimated for individual predictions from time series and machine learning models. The best model was selected for unique SKU's as per the most negligible mape value.

Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1942
Author(s):  
Pyae Pyae Phyo ◽  
Yung-Cheol Byun

The energy manufacturers are required to produce an accurate amount of energy by meeting the energy requirements at the end-user side. Consequently, energy prediction becomes an essential role in the electric industrial zone. In this paper, we propose the hybrid ensemble deep learning model, which combines multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), and hybrid CNN-LSTM to improve the forecasting performance. These DL architectures are more popular and better than other machine learning (ML) models for time series electrical load prediction. Therefore, hourly-based energy data are collected from Jeju Island, South Korea, and applied for forecasting. We considered external features associated with meteorological conditions affecting energy. Two-year training and one-year testing data are preprocessed and arranged to reform the times series, which are then trained in each DL model. The forecasting results of the proposed ensemble model are evaluated by using mean square error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Error metrics are compared with DL stand-alone models such as MLP, CNN, LSTM, and CNN-LSTM. Our ensemble model provides better performance than other forecasting models, providing minimum MAPE at 0.75%, and was proven to be inherently symmetric for forecasting time-series energy and demand data, which is of utmost concern to the power system sector.


2021 ◽  
Vol 7 ◽  
pp. e746
Author(s):  
Muhammad Naeem ◽  
Jian Yu ◽  
Muhammad Aamir ◽  
Sajjad Ahmad Khan ◽  
Olayinka Adeleye ◽  
...  

Background Forecasting the time of forthcoming pandemic reduces the impact of diseases by taking precautionary steps such as public health messaging and raising the consciousness of doctors. With the continuous and rapid increase in the cumulative incidence of COVID-19, statistical and outbreak prediction models including various machine learning (ML) models are being used by the research community to track and predict the trend of the epidemic, and also in developing appropriate strategies to combat and manage its spread. Methods In this paper, we present a comparative analysis of various ML approaches including Support Vector Machine, Random Forest, K-Nearest Neighbor and Artificial Neural Network in predicting the COVID-19 outbreak in the epidemiological domain. We first apply the autoregressive distributed lag (ARDL) method to identify and model the short and long-run relationships of the time-series COVID-19 datasets. That is, we determine the lags between a response variable and its respective explanatory time series variables as independent variables. Then, the resulting significant variables concerning their lags are used in the regression model selected by the ARDL for predicting and forecasting the trend of the epidemic. Results Statistical measures—Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE)—are used for model accuracy. The values of MAPE for the best-selected models for confirmed, recovered and deaths cases are 0.003, 0.006 and 0.115, respectively, which falls under the category of highly accurate forecasts. In addition, we computed 15 days ahead forecast for the daily deaths, recovered, and confirm patients and the cases fluctuated across time in all aspects. Besides, the results reveal the advantages of ML algorithms for supporting the decision-making of evolving short-term policies.


Author(s):  
David Adugh Kuhe ◽  
Jonathan Atsua Ikughur

Coronaviruses belong to a large family of viruses which affect the hepatic, gastrointestinal, neurological and respiratory systems. The increase in the daily number of COVID-19 confirmed and deaths cases from different countries of the world has brought social, economic and political activities to a standstill, affecting individuals, government, public and private sectors. In this study, autoregressive integrated moving average (ARIMA) time series model for modeling and forecasting daily confirmed, recovered, and deaths cases of COVID-19 in Nigeria was used with data on daily cases of confirmed, recovered and deaths due to COVID-19 in Nigeria from 27/02/2020-31/07/2020 obtained from Nigeria Centre for Disease Control (NCDC) website. The data from 27/02/2020-16/07/2020 were used for model building while 15 observations from 17/07/2020-31/07/2020 were used for training and forecast evaluations. Time plots and Dickey-Fuller Generalized Least Squares unit root test were used to investigate the stationarity properties of the data. Schwarz Information Criterion (SIC) in conjunction with log likelihood were used to search for optimal ARIMA models while Mean Absolute Percentage Error (MAPE) was used for forecast evaluation.  Results showed that all the study variables were differenced stationary and hence integrated of order one, I (1). ARIMA (2,1,4), ARIMA (2,1,2) and ARIMA (2,1,3) models were selected as the best candidates for modeling and forecasting the confirmed, recovered and deaths cases of COVID-19 in Nigeria respectively. The study found an approximate COVID-19 life cycle of 12 days among the infected population. The 15 days’ forecasts from ARIMA (2,1,4) and ARIMA (2,1,2) models showed increases in the daily number of confirmed and recovered cases of COVID-19 in Nigeria. The forecasts from ARIMA (2,1,3) model however showed fluctuating trend with decline in the number of deaths cases due to the disease. The result of the study further showed that improving on the present approach to treatment will further decrease the number of casualties due to COVID-19 in Nigeria.


2021 ◽  
Author(s):  
Elham Fijani ◽  
Khabat Khosravi ◽  
Rahim Barzegar ◽  
John Quilty ◽  
Jan Adamowski ◽  
...  

Abstract Random Tree (RT) and Iterative Classifier Optimizer (ICO) based on Alternating Model Tree (AMT) regressor machine learning (ML) algorithms coupled with Bagging (BA) or Additive Regression (AR) hybrid algorithms were applied to forecasting multistep ahead (up to three months) Lake Superior and Lake Michigan water level (WL). Partial autocorrelation (PACF) of each lake’s WL time series estimated the most important lag times — up to five months in both lakes — as potential inputs. The WL time series data was partitioned into training (from 1918 to 1988) and testing (from 1989 to 2018) for model building and evaluation, respectively. Developed algorithms were validated through statistically and visually based metric using testing data. Although both hybrid ensemble algorithms improved individual ML algorithms’ performance, the BA algorithm outperformed the AR algorithm. As a novel model in forecasting problems, the ICO algorithm was shown to have great potential in generating robust multistep lake WL forecasts.


2020 ◽  
Vol 13 (5) ◽  
pp. 827-832
Author(s):  
Iflah Aijaz ◽  
Parul Agarwal

Introduction: Auto-Regressive Integrated Moving Average (ARIMA) and Artificial Neural Networks (ANN) are leading linear and non-linear models in Machine learning respectively for time series forecasting. Objective: This survey paper presents a review of recent advances in the area of Machine Learning techniques and artificial intelligence used for forecasting different events. Methods: This paper presents an extensive survey of work done in the field of Machine Learning where hybrid models for are compared to the basic models for forecasting on the basis of error parameters like Mean Absolute Deviation (MAD), Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Normalized Root Mean Square Error (NRMSE). Results: Table 1 summarizes important papers discussed in this paper on the basis of some parameters which explain the efficiency of hybrid models or when the model is used in isolation. Conclusion: The hybrid model has realized accurate results as compared when the models were used in isolation yet some research papers argue that hybrids cannot always outperform individual models.


2021 ◽  
Vol 13 (4) ◽  
pp. 576
Author(s):  
Hua Su ◽  
Xuemei Lu ◽  
Zuoqi Chen ◽  
Hongsheng Zhang ◽  
Wenfang Lu ◽  
...  

Chlorophyll-a (chl-a) is an important parameter of water quality and its concentration can be directly retrieved from satellite observations. The Ocean and Land Color Instrument (OLCI), a new-generation water-color sensor onboard Sentinel-3A and Sentinel-3B, is an excellent tool for marine environmental monitoring. In this study, we introduce a new machine learning model, Light Gradient Boosting Machine (LightGBM), for estimating time-series chl-a concentration in Fujian’s coastal waters using multitemporal OLCI data and in situ data. We applied the Case 2 Regional CoastColour (C2RCC) processor to obtain OLCI band reflectance and constructed four spectral indices based on OLCI feature bands as supplementary input features. We also used root-mean-square error (RMSE), mean absolute error (MAE), median absolute percentage error (MAPE), and R2 as performance indicators. The results indicate that the addition of spectral indices can easily improve the prediction accuracy of the model, and normalized fluorescence height index (NFHI) has the best performance, with an RMSE of 0.38 µg/L, MAE of 0.22 µg/L, MAPE of 28.33%, and R2 of 0.785. Moreover, we used the well-known band ratio and three-band methods for chl-a estimation validation, and another two OLCI chl-a products were adopted for comparison (OC4Me chl-a and Inverse Modelling Technique (IMT) Neural Net chl-a). The results confirmed that the LightGBM model outperforms the traditional methods and OLCI chl-a products. This study provides an effective remote sensing technique for coastal chl-a concentration estimation and promotes the advantage of OLCI data in ocean color remote sensing.


Author(s):  
Marie Luthfi Ashari ◽  
Mujiono Sadikin

Sebagai upaya untuk memenangkan persaingan di pasar, perusahaan farmasi harus menghasilkan produk obat – obatan yang berkualitas. Untuk menghasilkan produk yang berkualitas, diperlukan perencanaan produksi yang baik dan efisien. Salah satu dasar perencanaan produksi adalah prediksi penjualan. PT. Metiska Farma telah menerapkan metode prediksi dalam proses produksi, akan tetapi prediksi yang dihasilkan tidak akurat sehingga menyebabkan tidak optimal dalam memenuhi permintaan pasar. Untuk meminimalisir masalah kurang akuratnya proses prediksi tersebut, dalam penelitian yang disajikan pada makalah ini dilakukan uji coba prediksi menggunakan teknik Machine Learning dengan metode Regresi Long Short Term Memory (LSTM). Teknik yang diusulkan diuji coba menggunakan dataset penjualan produk “X” dari PT. Metiska Farma dengan parameter kinerja Root Mean Squared Error (RMSE) dan MAPE (Mean Absolute Percentage Error). Hasil penelitian ini berupa nilai rata – rata evaluasi error dari pemodelan data training dan data testing. Di mana hasil menunjukan bahwa Regresi LSTM memiliki nilai prediksi penjualan dengan evaluasi model melalui RMSE sebesar 286.465.424 untuk data training dan 187.013.430 untuk data testing. Untuk nilai MAPE sebesar 787% dan 309% untuk data training dan data testing secara berurut.


Author(s):  
Aritra Sen ◽  
Shalmoli Dutta

Mortality is a continuous force of attrition, tending to reduce the population, a prime negative force in the balance of vital processes (Bhasin and Nag, 2004). Sample Registration System (SRS) serves as the only source of annual data on vital events on a full scale from 1969-70 in India. Few studies have examined the trends and patterns of mortality across time and regions in India (Preston and Bhat, 1984). The Under 5 Mortality Rates (U5MR) can be seen to decrease by more than half from 1970 to 2017 but in contrast little is known about the mortality patterns of the older children (5-9) and young adolescents (10-14), and not many studies have been done on their changing trends (Masquelier et al., 2018). Using the annual data for the 5-14 age, the trend of decline in the mortality patterns is studied from 1970 to 2013. The linear trend in the time series plot suggests analysis using time series models AR(p), MA(q), ARMA(p,q), Box- Jenkins ARIMA(p,d,q) and Random Walk with drift models to get the best fit to the trend of the data. The order of the time series models have been calculated by studying the ACF, PACF plots and the coefficients have been derived using the Yule-Walker equation matrix. An in-sample forecast of the years 2014-17 are taken. The Mean Squared Error (MSE) and the Mean Absolute Percentage Error (MAPE) as a measure of accuracy is used to determine the best fit model. ARIMA(3,1,1) produced lower values making it the best-fit model. Out-of-sample forecasting was done for 2018-2025. The forecast value shows that at the current trend, India would have 0.03 deaths per 1000 population in the 5-14 age group in 2025 showing that the government’s policies and health care interventions towards realization of the MDG4 goal is working positively.


2017 ◽  
Vol 7 (1) ◽  
pp. 54-60
Author(s):  
Johannes Tshepiso Tsoku ◽  
Nonofo Phukuntsi ◽  
Lebotsa Daniel Metsileng

The study employs the Box-Jenkins Methodology to forecast South African gold sales. For a resource economy like South Africa where metals and minerals account for a high proportion of GDP and export earnings, the decline in gold sales is very disturbing. Box-Jenkins time series technique was used to perform time series analysis of monthly gold sales for the period January 2000 to June 2013 with the following steps: model identification, model estimation, diagnostic checking and forecasting. Furthermore, the prediction accuracy is tested using mean absolute percentage error (MAPE). From the analysis, a seasonal ARIMA(4,1,4)×(0,1,1)12 was found to be the “best fit model” with an MAPE value of 11% indicating that the model is fit to be used to predict or forecast future gold sales for South Africa. In addition, the forecast values show that there will be a decrease in the overall gold sales for the first six months of 2014. It is hoped that the study will help the public and private sectors to understand the gold sales or output scenario and later plan the gold mining activities in South Africa. Furthermore, it is hoped that this research paper has demonstrated the significance of Box-Jenkins technique for this area of research and that they will be applied in the future.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Rutvik V. Shah ◽  
Gillian Grennan ◽  
Mariam Zafar-Khan ◽  
Fahad Alim ◽  
Sujit Dey ◽  
...  

AbstractDepression is a multifaceted illness with large interindividual variability in clinical response to treatment. In the era of digital medicine and precision therapeutics, new personalized treatment approaches are warranted for depression. Here, we use a combination of longitudinal ecological momentary assessments of depression, neurocognitive sampling synchronized with electroencephalography, and lifestyle data from wearables to generate individualized predictions of depressed mood over a 1-month time period. This study, thus, develops a systematic pipeline for N-of-1 personalized modeling of depression using multiple modalities of data. In the models, we integrate seven types of supervised machine learning (ML) approaches for each individual, including ensemble learning and regression-based methods. All models were verified using fourfold nested cross-validation. The best-fit as benchmarked by the lowest mean absolute percentage error, was obtained by a different type of ML model for each individual, demonstrating that there is no one-size-fits-all strategy. The voting regressor, which is a composite strategy across ML models, was best performing on-average across subjects. However, the individually selected best-fit models still showed significantly less error than the voting regressor performance across subjects. For each individual’s best-fit personalized model, we further extracted top-feature predictors using Shapley statistics. Shapley values revealed distinct feature determinants of depression over time for each person ranging from co-morbid anxiety, to physical exercise, diet, momentary stress and breathing performance, sleep times, and neurocognition. In future, these personalized features can serve as targets for a personalized ML-guided, multimodal treatment strategy for depression.


Sign in / Sign up

Export Citation Format

Share Document