Statistical Modelling
Latest Publications





Published By Sage Publications

1477-0342, 1471-082x

2022 ◽  
pp. 1471082X2110657
Sina Mews ◽  
Roland Langrock ◽  
Marius Ötting ◽  
Houda Yaqine ◽  
Jost Reinecke

Continuous-time state-space models (SSMs) are flexible tools for analysing irregularly sampled sequential observations that are driven by an underlying state process. Corresponding applications typically involve restrictive assumptions concerning linearity and Gaussianity to facilitate inference on the model parameters via the Kalman filter. In this contribution, we provide a general continuous-time SSM framework, allowing both the observation and the state process to be non-linear and non-Gaussian. Statistical inference is carried out by maximum approximate likelihood estimation, where multiple numerical integration within the likelihood evaluation is performed via a fine discretization of the state process. The corresponding reframing of the SSM as a continuous-time hidden Markov model, with structured state transitions, enables us to apply the associated efficient algorithms for parameter estimation and state decoding. We illustrate the modelling approach in a case study using data from a longitudinal study on delinquent behaviour of adolescents in Germany, revealing temporal persistence in the deviation of an individual's delinquency level from the population mean.

2021 ◽  
pp. 1471082X2110576
Laura Vana ◽  
Kurt Hornik

In this article, we propose a longitudinal multivariate model for binary and ordinal outcomes to describe the dynamic relationship among firm defaults and credit ratings from various raters. The latent probability of default is modelled as a dynamic process which contains additive firm-specific effects, a latent systematic factor representing the business cycle and idiosyncratic observed and unobserved factors. The joint set-up also facilitates the estimation of a bias for each rater which captures changes in the rating standards of the rating agencies. Bayesian estimation techniques are employed to estimate the parameters of interest. Several models are compared based on their out-of-sample prediction ability and we find that the proposed model outperforms simpler specifications. The joint framework is illustrated on a sample of publicly traded US corporates which are rated by at least one of the credit rating agencies S&P, Moody's and Fitch during the period 1995–2014.

2021 ◽  
pp. 1471082X2110626
Heather L. Turner ◽  
Andy D. Batchelor ◽  
David Firth

We propose a hazard model for entry into marriage, based on a bell-shaped function to model the dependence on age. We demonstrate near-aliasing in an extension that estimates the support of the hazard and mitigate this via re-parameterization. Our proposed model parameterizes the maximum hazard and corresponding age, thereby facilitating more general models where these features depend on covariates. For data on women's marriages from the Living in Ireland Surveys 1994–2001, this approach captures a reduced propensity to marry over successive cohorts and an increasing delay in the timing of marriage with increasing education.

2021 ◽  
pp. 1471082X2110561
Alexander Volkmann ◽  
Almond Stöcker ◽  
Fabian Scheipl ◽  
Sonja Greven

Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.

2021 ◽  
pp. 1471082X2110579
Eleonora Arnone ◽  
Laura M. Sangalli ◽  
Andrea Vicini

We consider spatio-temporal data and functional data with spatial dependence, characterized by complicated missing data patterns. We propose a new method capable to efficiently handle these data structures, including the case where data are missing over large portions of the spatio-temporal domain. The method is based on regression with partial differential equation regularization. The proposed model can accurately deal with data scattered over domains with irregular shapes and can accurately estimate fields exhibiting complicated local features. We demonstrate the consistency and asymptotic normality of the estimators. Moreover, we illustrate the good performances of the method in simulations studies, considering different missing data scenarios, from sparse data to more challenging scenarios where the data are missing over large portions of the spatial and temporal domains and the missing data are clustered in space and/or in time. The proposed method is compared to competing techniques, considering predictive accuracy and uncertainty quantification measures. Finally, we show an application to the analysis of lake surface water temperature data, that further illustrates the ability of the method to handle data featuring complicated patterns of missingness and highlights its potentiality for environmental studies.

2021 ◽  
pp. 1471082X2110605
Murray Aitkin ◽  
John Hinde ◽  
Brian Francis

A virtual interview with Murray Aitkin by Brian Francis and John Hinde, two of the original members of the Centre for Applied Statistics that Murray created at Lancaster University. The talk ranges over Murray's reflections of a career in statistical modelling and the many different collaborations across the world that have been such a significant part of it.

2021 ◽  
pp. 1471082X2110592
Jian-Wei Gou ◽  
Ye-Mao Xia ◽  
De-Peng Jiang

Two-part model (TPM) is a widely appreciated statistical method for analyzing semi-continuous data. Semi-continuous data can be viewed as arising from two distinct stochastic processes: one governs the occurrence or binary part of data and the other determines the intensity or continuous part. In the regression setting with the semi-continuous outcome as functions of covariates, the binary part is commonly modelled via logistic regression and the continuous component via a log-normal model. The conventional TPM, still imposes assumptions such as log-normal distribution of the continuous part, with no unobserved heterogeneity among the response, and no collinearity among covariates, which are quite often unrealistic in practical applications. In this article, we develop a two-part nonlinear latent variable model (TPNLVM) with mixed multiple semi-continuous and continuous variables. The semi-continuous variables are treated as indicators of the latent factor analysis along with other manifest variables. This reduces the dimensionality of the regression model and alleviates the potential multicollinearity problems. Our TPNLVM can accommodate the nonlinear relationships among latent variables extracted from the factor analysis. To downweight the influence of distribution deviations and extreme observations, we develop a Bayesian semiparametric analysis procedure. The conventional parametric assumptions on the related distributions are relaxed and the Dirichlet process (DP) prior is used to improve model fitting. By taking advantage of the discreteness of DP, our method is effective in capturing the heterogeneity underlying population. Within the Bayesian paradigm, posterior inferences including parameters estimates and model assessment are carried out through Markov Chains Monte Carlo (MCMC) sampling method. To facilitate posterior sampling, we adapt the Polya-Gamma stochastic representation for the logistic model. Using simulation studies, we examine properties and merits of our proposed methods and illustrate our approach by evaluating the effect of treatment on cocaine use and examining whether the treatment effect is moderated by psychiatric problems.

2021 ◽  
pp. 1471082X2110486
Maarten Coemans ◽  
Geert Verbeke ◽  
Maarten Naesens

The estimated glomerular filtration rate (eGFR) quantifies kidney graft function and is measured repeatedly after transplantation. Kidney graft rejection is diagnosed by performing biopsies on a regular basis (protocol biopsies at time of stable eGFR) or by performing biopsies due to clinical cause (indication biopsies at time of declining eGFR). The diagnostic value of the eGFR evolution as biomarker for rejection is not well established. To this end, we built a joint model which combines characteristics of transition models and shared parameter models to carry over information from one biopsy to the next, taking into account the longitudinal information of eGFR collected in between. From our model, applied to data of University Hospitals Leuven (870 transplantations, 2 635 biopsies), we conclude that a negative deviation from the mean eGFR slope increases the probability of rejection in indication biopsies, but that, on top of the biopsy history, there is little benefit in using the eGFR profile for diagnosing rejection. Methodologically, our model fills a gap in the biomarker literature by relating a frequently (repeatedly) measured continuous outcome with a less frequently (repeatedly) measured binary indicator. The developed joint transition model is flexible and applicable to multiple other research settings.

2021 ◽  
pp. 1471082X2110410
Elena Tuzhilina ◽  
Leonardo Tozzi ◽  
Trevor Hastie

Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an [Formula: see text] penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.

2021 ◽  
pp. 1471082X2110439
Daniel M. Sheanshang ◽  
Philip A. White ◽  
Durban G. Keeler

In many settings, data acquisition generates outliers that can obscure inference. Therefore, practitioners often either identify and remove outliers or accommodate outliers using robust models. However, identifying and removing outliers is often an ad hoc process that affects inference, and robust methods are often too simple for some applications. In our motivating application, scientists drill snow cores and measure snow density to infer densification rates that aid in estimating snow water accumulation rates and glacier mass balances. Advanced measurement techniques can measure density at high resolution over depth but are sensitive to core imperfections, making them prone to outliers. Outlier accommodation is challenging in this setting because the distribution of outliers evolves over depth and the data demonstrate natural heteroscedasticity. To address these challenges, we present a two-component mixture model using a physically motivated snow density model and an outlier model, both of which evolve over depth. The physical component of the mixture model has a mean function with normally distributed depth-dependent heteroscedastic errors. The outlier component is specified using a semiparametric prior density process constructed through a normalized process convolution of log-normal random variables. We demonstrate that this model outperforms alternatives and can be used for various inferential tasks.

Sign in / Sign up

Export Citation Format

Share Document