Negative Binomial Time Series Regression – Random Forest Ensemble in Intermittent Data

Amri Muhaimin; Prismahardi Aji Riyantoko; Hendri Prabowo; Trimono Trimono

doi:10.33005/ijdasea.v1i2.10

Negative Binomial Time Series Regression – Random Forest Ensemble in Intermittent Data

Internasional Journal of Data Science, Engineering, and Anaylitics ◽

10.33005/ijdasea.v1i2.10 ◽

2021 ◽

Vol 1 (2) ◽

pp. 36-42

Author(s):

Amri Muhaimin ◽

Prismahardi Aji Riyantoko ◽

Hendri Prabowo ◽

Trimono Trimono

Keyword(s):

Time Series ◽

Random Forest ◽

Negative Binomial ◽

Rainfall Data ◽

Time Series Regression ◽

Sales Data ◽

Unique Data ◽

The Mean ◽

The Ensemble Method ◽

Single Exponential

Intermittent dataset is a unique data that will be challenging to forecast. Because the data is containing a lot of zeros. The kind of intermittent data can be sales data and rainfall data. Because both sometimes no data recorded in a certain period. In this research, the model is created to overcome the problem. The approach that is used in this research is the ensemble method. Mostly the intermittent data comes from the Negative Binomial because the variance is over the mean. We use two datasets, which are rainfall and sales data. So, our approach is creating the base model from the time series regression with Negative Binomial based, and then we augmented the base model with a tree-based model which is random forest. Furthermore, we compare the result with the benchmark method which is The Croston method and Single Exponential Smoothing (SES). As the result, our approach can overcome the benchmark based on metric value by 1.79 and 7.18.

Download Full-text

On variance estimation in a negative binomial time series regression model

Journal of Multivariate Analysis ◽

10.1016/j.jmva.2012.06.006 ◽

2012 ◽

Vol 112 ◽

pp. 145-155 ◽

Cited By ~ 2

Author(s):

Rongning Wu

Keyword(s):

Time Series ◽

Regression Model ◽

Variance Estimation ◽

Negative Binomial ◽

Time Series Regression

Download Full-text

GOOD SALES FORECASTING INFORMATION SYSTEM USING SINGLE EXPONENTIAL SMOOTHING METHOD

Jurnal Teknologi Informasi dan Pendidikan ◽

10.24036/tip.v14i1.453 ◽

2021 ◽

Vol 14 (1) ◽

pp. 77-82

Author(s):

Rahmadini Darwas ◽

Rahimullaily Rahimullaily ◽

Naufal Abdi

Keyword(s):

Information System ◽

Exponential Smoothing ◽

Mean Value ◽

Smoothing Method ◽

Sales Data ◽

Cooking Oil ◽

The Mean ◽

Exponential Smoothing Method ◽

Single Exponential

This study aims to determine the estimated number of items sold at one of the mini market, namely the Tita shop, especially Sari Murni cooking oil, 2 liter packs for the next one month based on sales data for January 2016 to December 2017. The problems that occur at Tita`s shop are is difficult to estimate the amount of stock of goods and calculate the estimated cost required for sales in the next month period, so it is necessary to build a forecasting information system using the single exponential smoothing method which assumes that the data fluctuates around the mean value without any trend or seasonal elements. This study resulted in the amount of 2 liter packaged sari murni cooking oil in Januari 2018, which was 42 pcs. Meanwhile, the estimated cost required to buy 2 liter packaged cooking oil stock in that period is Rp. 609.000,00 with a capital price unit of goods Rp. 14.500,00.

Download Full-text

SIMULTANEOUS SPECIFICATION TESTING OF MEAN AND VARIANCE STRUCTURES IN NONLINEAR TIME SERIES REGRESSION

Econometric Theory ◽

10.1017/s0266466610000502 ◽

2011 ◽

Vol 27 (4) ◽

pp. 792-843 ◽

Cited By ~ 5

Author(s):

Song Xi Chen ◽

Jiti Gao

Keyword(s):

Time Series ◽

Empirical Likelihood ◽

Goodness Of Fit ◽

Test Procedure ◽

Single Pair ◽

Test Statistic ◽

Time Series Regression ◽

Variance Functions ◽

Mean And Variance ◽

The Mean

This paper proposes a nonparametric simultaneous test for parametric specification of the conditional mean and variance functions in a time series regression model. The test is based on an empirical likelihood (EL) statistic that measures the goodness of fit between the parametric estimates and the nonparametric kernel estimates of the mean and variance functions. A unique feature of the test is its ability to distribute natural weights automatically between the mean and the variance components of the goodness-of-fit measure. To reduce the dependence of the test on a single pair of smoothing bandwidths, we construct an adaptive test by maximizing a standardized version of the empirical likelihood test statistic over a set of smoothing bandwidths. The test procedure is based on a bootstrap calibration to the distribution of the empirical likelihood test statistic. We demonstrate that the empirical likelihood test is able to distinguish local alternatives that are different from the null hypothesis at an optimal rate.

Download Full-text

Evaluating Measles Incidence Rates Using Machine Learning and Time Series Methods in the Center of Iran; 1997-2020

10.21203/rs.3.rs-45999/v1 ◽

2020 ◽

Author(s):

Javad Nazari ◽

Parnia-Sadat Fathi ◽

Nahid Sharahi ◽

Majid Taheri ◽

Payam Amini ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Random Forest ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Incidence Rates ◽

Support Vector ◽

Historical Cohort ◽

Linear Discriminant ◽

Measles Incidence

Abstract Background: Measles is a feverish condition labeled among the most infectious viral illnesses in the globe. Despite the presence of a secure, accessible, affordable and efficient vaccine, measles continues to be a worldwide concern. Methods: This study uses machine learning and time series methods to assess factors that placed people at a higher risk of measles. This historical cohort study contained the Measles incidence in Markazi Province, the center of Iran, from April 1997 to February 2020. Logistic regression, linear discriminant analysis, random forest, artificial neural network, bagging, support vector machine, and naïve Bayes were used to make the classification. Zero-inflated negative binomial regression for time series was utilized to assess development of measles over time. Results: The prevalence of measles was 14.5% over the recent 24 years and a constant trend of almost zero cases was observed from 2002 to 2020. The order of independent variable importance were recent years, age, vaccination, rhinorrhea, male sex, contact with measles patients, cough, conjunctivitis, ethnic, and fever. Younger age, less probability of contact and no fever is associated with less odds of zero cases. Only 7 new cases were forecasted for the next two years. Bagging and random forest were the most accurate classification methods. Conclusion: Even if the numbers of new cases are almost zero during the recent years, it has been showed that age and contact are responsible for non-occurrence of measles. October and May are prone to have new cases for 2021 and 2022.

Download Full-text

Assessment of autoregressive integrated moving average (ARIMA), generalized linear autoregressive moving average (GLARMA), and random forest (RF) time series regression models for predicting influenza A virus frequency in swine in Ontario, Canada

PLoS ONE ◽

10.1371/journal.pone.0198313 ◽

2018 ◽

Vol 13 (6) ◽

pp. e0198313 ◽

Cited By ~ 13

Author(s):

Tatiana Petukhova ◽

Davor Ojkic ◽

Beverly McEwen ◽

Rob Deardon ◽

Zvonimir Poljak

Keyword(s):

Time Series ◽

Random Forest ◽

Influenza A Virus ◽

Regression Models ◽

Influenza A ◽

Moving Average ◽

Autoregressive Moving Average ◽

Time Series Regression ◽

Autoregressive Integrated Moving Average

Download Full-text

Multiple fault diagnosis for hydraulic systems using Nearest-centroid-with-DBA and Random-Forest-based-time-series-classification

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9189401 ◽

2020 ◽

Author(s):

Zhijie Peng ◽

Ke Zhang ◽

Yi Chai

Keyword(s):

Time Series ◽

Fault Diagnosis ◽

Random Forest ◽

Time Series Classification ◽

Hydraulic Systems ◽

Multiple Fault ◽

Multiple Fault Diagnosis

Download Full-text

Der Sommer und Herbst 2003 aus phänologischer Sicht | Summer and autumn 2003 from a phenological point of view

Schweizerische Zeitschrift fur Forstwesen ◽

10.3188/szf.2004.0142 ◽

2004 ◽

Vol 155 (5) ◽

pp. 142-145 ◽

Cited By ~ 2

Author(s):

Claudio Defila

Keyword(s):

Time Series ◽

New Record ◽

Point Of View ◽

Late Spring ◽

Mean Deviation ◽

Leaf Shedding ◽

The Mean ◽

Very High ◽

Selection Of

The record-breaking heatwave of 2003 also had an impact on the vegetation in Switzerland. To examine its influences seven phenological late spring and summer phases were evaluated together with six phases in the autumn from a selection of stations. 30% of the 122 chosen phenological time series in late spring and summer phases set a new record (earliest arrival). The proportion of very early arrivals is very high and the mean deviation from the norm is between 10 and 20 days. The situation was less extreme in autumn, where 20% of the 103 time series chosen set a new record. The majority of the phenological arrivals were found in the class «normal» but the class«very early» is still well represented. The mean precocity lies between five and twenty days. As far as the leaf shedding of the beech is concerned, there was even a slight delay of around six days. The evaluation serves to show that the heatwave of 2003 strongly influenced the phenological events of summer and spring.

Download Full-text

Fluctuations in Influenza Epidemics and Suicide Mortality: A Time-Series Regression of 13-Year Mortality Data

SSRN Electronic Journal ◽

10.2139/ssrn.3441827 ◽

2019 ◽

Author(s):

Sun Jae Jung ◽

Sung-Shil Lim ◽

Jin-Ha Yoon

Keyword(s):

Time Series ◽

Suicide Mortality ◽

Mortality Data ◽

Time Series Regression

Download Full-text

The influence of solar wind on extratropical cyclones – Part 1: Wilcox effect revisited

Annales Geophysicae ◽

10.5194/angeo-27-1-2009 ◽

2009 ◽

Vol 27 (1) ◽

pp. 1-30 ◽

Cited By ~ 24

Author(s):

P. Prikryl ◽

V. Rušin ◽

M. Rybanský

Keyword(s):

Time Series ◽

Solar Wind ◽

Northern Hemisphere ◽

High Speed ◽

Coronal Holes ◽

Extratropical Cyclones ◽

Sector Boundary ◽

Speed Solar Wind ◽

The Mean ◽

High Speed Solar Wind

Abstract. A sun-weather correlation, namely the link between solar magnetic sector boundary passage (SBP) by the Earth and upper-level tropospheric vorticity area index (VAI), that was found by Wilcox et al. (1974) and shown to be statistically significant by Hines and Halevy (1977) is revisited. A minimum in the VAI one day after SBP followed by an increase a few days later was observed. Using the ECMWF ERA-40 re-analysis dataset for the original period from 1963 to 1973 and extending it to 2002, we have verified what has become known as the "Wilcox effect" for the Northern as well as the Southern Hemisphere winters. The effect persists through years of high and low volcanic aerosol loading except for the Northern Hemisphere at 500 mb, when the VAI minimum is weak during the low aerosol years after 1973, particularly for sector boundaries associated with south-to-north reversals of the interplanetary magnetic field (IMF) BZ component. The "disappearance" of the Wilcox effect was found previously by Tinsley et al. (1994) who suggested that enhanced stratospheric volcanic aerosols and changes in air-earth current density are necessary conditions for the effect. The present results indicate that the Wilcox effect does not require high aerosol loading to be detected. The results are corroborated by a correlation with coronal holes where the fast solar wind originates. Ground-based measurements of the green coronal emission line (Fe XIV, 530.3 nm) are used in the superposed epoch analysis keyed by the times of sector boundary passage to show a one-to-one correspondence between the mean VAI variations and coronal holes. The VAI is modulated by high-speed solar wind streams with a delay of 1–2 days. The Fourier spectra of VAI time series show peaks at periods similar to those found in the solar corona and solar wind time series. In the modulation of VAI by solar wind the IMF BZ seems to control the phase of the Wilcox effect and the depth of the VAI minimum. The mean VAI response to SBP associated with the north-to-south reversal of BZ is leading by up to 2 days the mean VAI response to SBP associated with the south-to-north reversal of BZ. For the latter, less geoeffective events, the VAI minimum deepens (with the above exception of the Northern Hemisphere low-aerosol 500-mb VAI) and the VAI maximum is delayed. The phase shift between the mean VAI responses obtained for these two subsets of SBP events may explain the reduced amplitude of the overall Wilcox effect. In a companion paper, Prikryl et al. (2009) propose a new mechanism to explain the Wilcox effect, namely that solar-wind-generated auroral atmospheric gravity waves (AGWs) influence the growth of extratropical cyclones. It is also observed that severe extratropical storms, explosive cyclogenesis and significant sea level pressure deepenings of extratropical storms tend to occur within a few days of the arrival of high-speed solar wind. These observations are discussed in the context of the proposed AGW mechanism as well as the previously suggested atmospheric electrical current (AEC) model (Tinsley et al., 1994), which requires the presence of stratospheric aerosols for a significant (Wilcox) effect.

Download Full-text

Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores

Hydrology and Earth System Sciences ◽

10.5194/hess-23-4323-2019 ◽

2019 ◽

Vol 23 (10) ◽

pp. 4323-4331 ◽

Cited By ~ 60

Author(s):

Wouter J. M. Knoben ◽

Jim E. Freer ◽

Ross A. Woods

Keyword(s):

Time Series ◽

Coefficient Of Variation ◽

Ad Hoc ◽

Model Performance ◽

Technical Note ◽

Mean Flow ◽

Model Adequacy ◽

Robust Model ◽

The Mean ◽

Adequacy Assessment

Abstract. A traditional metric used in hydrology to summarize model performance is the Nash–Sutcliffe efficiency (NSE). Increasingly an alternative metric, the Kling–Gupta efficiency (KGE), is used instead. When NSE is used, NSE = 0 corresponds to using the mean flow as a benchmark predictor. The same reasoning is applied in various studies that use KGE as a metric: negative KGE values are viewed as bad model performance, and only positive values are seen as good model performance. Here we show that using the mean flow as a predictor does not result in KGE = 0, but instead KGE =1-√2≈-0.41. Thus, KGE values greater than −0.41 indicate that a model improves upon the mean flow benchmark – even if the model's KGE value is negative. NSE and KGE values cannot be directly compared, because their relationship is non-unique and depends in part on the coefficient of variation of the observed time series. Therefore, modellers who use the KGE metric should not let their understanding of NSE values guide them in interpreting KGE values and instead develop new understanding based on the constitutive parts of the KGE metric and the explicit use of benchmark values to compare KGE scores against. More generally, a strong case can be made for moving away from ad hoc use of aggregated efficiency metrics and towards a framework based on purpose-dependent evaluation metrics and benchmarks that allows for more robust model adequacy assessment.

Download Full-text