scholarly journals Model Selection for Time Series Count Data with Over-Dispersion

Author(s):  
Saleh Ibrahim Musa ◽  
N. O. Nweze

Time series of count with over-dispersion is the reality often encountered in many biomedical and public health applications.  Statistical modelling of this type of series has been a great challenge. Rottenly, the Poisson and negative binomial distributions have been widely used in practice for discrete count time series data, their forms are too simplistic to accommodate features such as over-dispersion. Unable to account for these associated features while analyzing such data may result in incorrect and sometimes misleading inferences as well as detection of spurious associations. Therefore, the need for further investigation of count time series models suitable to fit count time series with over-dispersion of different level. The study therefore proposed a best model that can fit and forecast time series count data with different levels of over-dispersion and sample sizes Simulation studies were conducted using R statistical package, to investigate the performances of Autoregressiove Conditional Poisson (ACP) and Poisson Autoregressive (PAR) models. The predictive ability of the models were observed at different steps ahead. The relative performance of the models were examined using Akaike Information criteria (AIC) and Hannan-Quinn Information Criteria (HQIC). Conclusively, the best model to fit was ACP at different sample sizes. The predictive abilities of the four fitted models increased as sample size and number of steps ahead were increased

Author(s):  
Jae-Hyun Kim, Chang-Ho An

Due to the global economic downturn, the Korean economy continues to slump. Hereupon the Bank of Korea implemented a monetary policy of cutting the base rate to actively respond to the economic slowdown and low prices. Economists have been trying to predict and analyze interest rate hikes and cuts. Therefore, in this study, a prediction model was estimated and evaluated using vector autoregressive model with time series data of long- and short-term interest rates. The data used for this purpose were call rate (1 day), loan interest rate, and Treasury rate (3 years) between January 2002 and December 2019, which were extracted monthly from the Bank of Korea database and used as variables, and a vector autoregressive (VAR) model was used as a research model. The stationarity test of variables was confirmed by the ADF-unit root test. Bidirectional linear dependency relationship between variables was confirmed by the Granger causality test. For the model identification, AICC, SBC, and HQC statistics, which were the minimum information criteria, were used. The significance of the parameters was confirmed through t-tests, and the fitness of the estimated prediction model was confirmed by the significance test of the cross-correlation matrix and the multivariate Portmanteau test. As a result of predicting call rate, loan interest rate, and Treasury rate using the prediction model presented in this study, it is predicted that interest rates will continue to drop.


Author(s):  
Nendra Mursetya Somasih Dwipa

A stock returns data are one of type time series data who has a high volatility and different variance in every point of time. Such data are volatile, seting up a pattern of asymmetrical, having a nonstationary model, and that does not have a constant residual variance (heteroscedasticity). A time series ARCH and GARCH model can explain the heterocedasticity of data, but they are not always able to fully capture the asymmetric property of high frequency. Integrated Generalized Autoregresive Heteroskedascticity (IGARCH) model overcome GARCH weaknesses in capturing unit root. Furthermore IGARCH models were used to estimate the value of VaR as the maximum loss that will be obtained during a certain period at a certain confidence level. The aim of this study was to determine the best forecasting model of Jakarta Composite Index (JSI). The model had used in this study are ARCH, GARCH, and IGARCH. From the case studies were carried out, the result of forecasting volatility of stock index by using IGARCH(1,1) obtained log likelihood values that 3857,979 to the information criteria AIC = -6,3180; BIC = -6,3013; SIC = -6,3180; dan HQIC = -6,3117. Value of VaR movement of the JCI if it becomes greater the investment is Rp.500,000,000.00 with a confidence level of 95% on the date of July 2, 2015 using a model IGARCH (1,1) is Rp7.166.315,00.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Kassim Tawiah ◽  
Wahab Abdul Iddrisu ◽  
Killian Asampana Asosega

Discrete count time series data with an excessive number of zeros have warranted the development of zero-inflated time series models to incorporate the inflation of zeros and the overdispersion that comes with it. In this paper, we investigated the characteristics of the trend of daily count of COVID-19 deaths in Ghana using zero-inflated models. We envisaged that the trend of COVID-19 deaths per day in Ghana portrays a general increase from the onset of the pandemic in the country to about day 160 after which there is a general decrease onward. We fitted a zero-inflated Poisson autoregressive model and zero-inflated negative binomial autoregressive model to the data in the partial-likelihood framework. The zero-inflated negative binomial autoregressive model outperformed the zero-inflated Poisson autoregressive model. On the other hand, the dynamic zero-inflated Poisson autoregressive model performed better than the dynamic negative binomial autoregressive model. The predicted new death based on the zero-inflated negative binomial autoregressive model indicated that Ghana’s COVID-19 death per day will rise sharply few days after 30th November 2020 and drastically fall just as in the observed data.


Author(s):  
Nicholas Hoernle ◽  
Kobi Gal ◽  
Barbara Grosz ◽  
Leilah Lyons ◽  
Ada Ren ◽  
...  

This paper describes methods for comparative evaluation of the interpretability of models of high dimensional time series data inferred by unsupervised machine learning algorithms. The time series data used in this investigation were logs from an immersive simulation like those commonly used in education and healthcare training. The structures learnt by the models provide representations of participants' activities in the simulation which are intended to be meaningful to people's interpretation. To choose the model that induces the best representation, we designed two interpretability tests, each of which evaluates the extent to which a model’s output aligns with people’s expectations or intuitions of what has occurred in the simulation. We compared the performance of the models on these interpretability tests to their performance on statistical information criteria. We show that the models that optimize interpretability quality differ from those that optimize (statistical) information theoretic criteria. Furthermore, we found that a model using a fully Bayesian approach performed well on both the statistical and human-interpretability measures. The Bayesian approach is a good candidate for fully automated model selection, i.e., when direct empirical investigations of interpretability are costly or infeasible.


2021 ◽  
Vol 10 (s1) ◽  
Author(s):  
Sami Khedhiri

Abstract Objectives Modeling and forecasting possible trajectories of COVID-19 infections and deaths using statistical methods is one of the most important topics in present time. However, statistical models use different assumptions and methods and thus yield different results. One issue in monitoring disease progression over time is how to handle excess zeros counts. In this research, we assess the statistical empirical performance of these models in terms of their fit and forecast accuracy of COVID-19 deaths. Methods Two types of models are suggested in the literature to study count time series data. The first type of models is based on Poisson and negative binomial conditional probability distributions to account for data over dispersion and using auto regression to account for dependence of the responses. The second type of models is based on zero-inflated mixed auto regression and also uses exponential family conditional distributions. We study the goodness of fit and forecast accuracy of these count time series models based on autoregressive conditional count distributions with and without zero inflation. Results We illustrate these methods using a recently published online COVID-19 data for Tunisia, which reports daily death counts from March 2020 to February 2021. We perform an empirical analysis and we compare the fit and the forecast performance of these models for death counts in presence of an intervention policy. Our statistical findings show that models that account for zero inflation produce better fit and have more accurate forecast of the pandemic deaths. Conclusions This paper shows that infectious disease data with excess zero counts are better modelled with zero-inflated models. These models yield more accurate predictions of deaths related to the pandemic than the generalized count data models. In addition, our statistical results find that the lift of travel restrictions has a significant impact on the surge of COVID-19 deaths. One plausible explanation of the outperformance of zero-inflated models is that the zero values are related to an intervention policy and therefore they are structural.


1989 ◽  
Vol 40 (3) ◽  
pp. 241 ◽  
Author(s):  
DR Welsh ◽  
DB Stewart

Intervention analysis is a rigorous statistical modelling technique used to measure the effect of a shift in the mean level of a time series, caused by an intervention. A general formulation of an intervention model is applied to water-quality data for two streams in north-eastern Victoria, measuring the effect of drought on the electrical conductivity of one stream, and the effect of bushfires on the flow and turbidity of the other. The nature of the intervention is revealed using exploratory data-analysis techniques, such as smoothing and boxplots, on the time-series data. Intervention analysis is then used to confirm the identified changes and estimate their magnitude. The increased level of electrical conductivity due to drought is determined by three techniques of estimation and the results compared. The best of these techniques is then used to model changes in stream flow and turbidity following bushfires in the catchment.


2009 ◽  
Vol 2009 ◽  
pp. 1-37 ◽  
Author(s):  
Risa Kato ◽  
Takayuki Shiohama

Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.


Author(s):  
Kimberly F. Sellers ◽  
Ali Arab ◽  
Sean Melville ◽  
Fanyu Cui

AbstractAl-Osh and Alzaid (1988) consider a Poisson moving average (PMA) model to describe the relation among integer-valued time series data; this model, however, is constrained by the underlying equi-dispersion assumption for count data (i.e., that the variance and the mean equal). This work instead introduces a flexible integer-valued moving average model for count data that contain over- or under-dispersion via the Conway-Maxwell-Poisson (CMP) distribution and related distributions. This first-order sum-of-Conway-Maxwell-Poissons moving average (SCMPMA(1)) model offers a generalizable construct that includes the PMA (among others) as a special case. We highlight the SCMPMA model properties and illustrate its flexibility via simulated data examples.


Author(s):  
Samuel Olorunfemi Adams ◽  
Rueben Adeyemi Ipinyomi

Spline Smoothing is used to filter out noise or disturbance in an observation, its performance depends on the choice of smoothing parameters. There are many methods of estimating smoothing parameters; most popular among them are; Generalized Maximum Likelihood (GML), Generalized Cross-Validation (GCV), and Unbiased Risk (UBR), this methods tend to overfit smoothing parameters in the presence of autocorrelation error. A new Spline Smoothing estimation method is proposed and compare with three existing methods in order to eliminate the problem of over fitting associated with the presence of Autocorrelation in the error term. It is demonstrated through a simulation study performed by using a program written in R based on the predictive Mean Score Error criteria. The result indicated that the predictive mean square error (PMSE) of the four smoothing methods decreases as the smoothing parameters increases and decreases as the sample sizes increases. This study discovered that the proposed smoothing method is the best for time series observations with Autocorrelated error because it doesn’t over fit and works well for large sample sizes. This study will help researchers overcome the problem of over fitting associated with applying Smoothing spline method time series observation.


Sign in / Sign up

Export Citation Format

Share Document