Stats
Latest Publications


TOTAL DOCUMENTS

145
(FIVE YEARS 129)

H-INDEX

3
(FIVE YEARS 3)

Published By Mdpi Ag

2571-905x

Stats ◽  
2022 ◽  
Vol 5 (1) ◽  
pp. 70-88
Author(s):  
Johannes Ferreira ◽  
Ané van der Merwe

This paper proposes a previously unconsidered generalization of the Lindley distribution by allowing for a measure of noncentrality. Essential structural characteristics are investigated and derived in explicit and tractable forms, and the estimability of the model is illustrated via the fit of this developed model to real data. Subsequently, this model is used as a candidate for the parameter of a Poisson model, which allows for departure from the usual equidispersion restriction that the Poisson offers when modelling count data. This Poisson-noncentral Lindley is also systematically investigated and characteristics are derived. The value of this count model is illustrated and implemented as the count error distribution in an integer autoregressive environment, and juxtaposed against other popular models. The effect of the systematically-induced noncentrality parameter is illustrated and paves the way for future flexible modelling not only as a standalone contender in continuous Lindley-type scenarios but also in discrete and discrete time series scenarios when the often-encountered equidispersed assumption is not adhered to in practical data environments.


Stats ◽  
2022 ◽  
Vol 5 (1) ◽  
pp. 52-69
Author(s):  
Darcy Steeg Morris ◽  
Kimberly F. Sellers

Clustered count data are commonly modeled using Poisson regression with random effects to account for the correlation induced by clustering. The Poisson mixed model allows for overdispersion via the nature of the within-cluster correlation, however, departures from equi-dispersion may also exist due to the underlying count process mechanism. We study the cross-sectional COM-Poisson regression model—a generalized regression model for count data in light of data dispersion—together with random effects for analysis of clustered count data. We demonstrate model flexibility of the COM-Poisson random intercept model, including choice of the random effect distribution, via simulated and real data examples. We find that COM-Poisson mixed models provide comparable model fit to well-known mixed models for associated special cases of clustered discrete data, and result in improved model fit for data with intermediate levels of over- or underdispersion in the count mechanism. Accordingly, the proposed models are useful for capturing dispersion not consistent with commonly used statistical models, and also serve as a practical diagnostic tool.


Stats ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 26-51
Author(s):  
Paul Doukhan ◽  
Joseph Rynkiewicz ◽  
Yahia Salhi

This article proposes an optimal and robust methodology for model selection. The model of interest is a parsimonious alternative framework for modeling the stochastic dynamics of mortality improvement rates introduced recently in the literature. The approach models mortality improvements using a random field specification with a given causal structure instead of the commonly used factor-based decomposition framework. It captures some well-documented stylized facts of mortality behavior including: dependencies among adjacent cohorts, the cohort effects, cross-generation correlations, and the conditional heteroskedasticity of mortality. Such a class of models is a generalization of the now widely used AR-ARCH models for univariate processes. A the framework is general, it was investigated and illustrated a simple variant called the three-level memory model. However, it is not clear which is the best parameterization to use for specific mortality uses. In this paper, we investigate the optimal model choice and parameter selection among potential and candidate models. More formally, we propose a methodology well-suited to such a random field able to select thebest model in the sense that the model is not only correct but also most economical among all thecorrectmodels. Formally, we show that a criterion based on a penalization of the log-likelihood, e.g., the using of the Bayesian Information Criterion, is consistent. Finally, we investigate the methodology based on Monte-Carlo experiments as well as real-world datasets.


Stats ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 12-25
Author(s):  
Jean Chung ◽  
Guanchao Tong ◽  
Jiayou Chao ◽  
Wei Zhu

Global sea-level rise has been drawing increasingly greater attention in recent years, as it directly impacts the livelihood and sustainable development of humankind. Our research focuses on identifying causal factors and pathways on sea level changes (both global and regional) and subsequently predicting the magnitude of such changes. To this end, we have designed a novel analysis pipeline including three sequential steps: (1) a dynamic structural equation model (dSEM) to identify pathways between the global mean sea level (GMSL) and various predictors, (2) a vector autoregression model (VAR) to quantify the GMSL changes due to the significant relations identified in the first step, and (3) a generalized additive model (GAM) to model the relationship between regional sea level and GMSL. Historical records of GMSL and other variables from 1992 to 2020 were used to calibrate the analysis pipeline. Our results indicate that greenhouse gases, water, and air temperatures, change in Antarctic and Greenland Ice Sheet mass, sea ice, and historical sea level all play a significant role in future sea-level rise. The resulting 95% upper bound of the sea-level projections was combined with a threshold for extreme flooding to map out the extent of sea-level rise in coastal communities using a digital coastal tracker.


Stats ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 1-11
Author(s):  
Felix Mbuga ◽  
Cristina Tortora

Cluster analysis seeks to assign objects with similar characteristics into groups called clusters so that objects within a group are similar to each other and dissimilar to objects in other groups. Spectral clustering has been shown to perform well in different scenarios on continuous data: it can detect convex and non-convex clusters, and can detect overlapping clusters. However, the constraint on continuous data can be limiting in real applications where data are often of mixed-type, i.e., data that contains both continuous and categorical features. This paper looks at extending spectral clustering to mixed-type data. The new method replaces the Euclidean-based similarity distance used in conventional spectral clustering with different dissimilarity measures for continuous and categorical variables. A global dissimilarity measure is than computed using a weighted sum, and a Gaussian kernel is used to convert the dissimilarity matrix into a similarity matrix. The new method includes an automatic tuning of the variable weight and kernel parameter. The performance of spectral clustering in different scenarios is compared with that of two state-of-the-art mixed-type data clustering methods, k-prototypes and KAMILA, using several simulated and real data sets.


Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 1091-1115
Author(s):  
Bradley Efron

This article was prepared for the Special Issue on Resampling methods for statistical inference of the 2020s. Modern algorithms such as random forests and deep learning are automatic machines for producing prediction rules from training data. Resampling plans have been the key technology for evaluating a rule’s prediction accuracy. After a careful description of the measurement of prediction error the article discusses the advantages and disadvantages of the principal methods: cross-validation, the nonparametric bootstrap, covariance penalties (Mallows’ Cp and the Akaike Information Criterion), and conformal inference. The emphasis is on a broad overview of a large subject, featuring examples, simulations, and a minimum of technical detail.


Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 1069-1079
Author(s):  
Marcel Ausloos ◽  
Claudiu Herteliu

The paper focusses on the growth or/and decline in the number of devotees in UK Dioceses of the Church of England during the “Decade of Evangelism” [1990–2000]. In this study, rank-size relationships and subsequent correlations are searched for through various performance indicators of evangelism management. A strong structural regularity is found. Moreover, it is shown that such key indicators appear to fall into two different classes. This unexpected feature seems to indicate some basic universality regimes, in particular to distinguish behaviour measures. Rank correlations between indicators measures further emphasise some difference in evangelism management between Evangelical and Catholic Anglican tradition dioceses (or rather bishops) during that time interval.


Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 1080-1090
Author(s):  
Oke Gerke ◽  
Sören Möller

Bland–Altman agreement analysis has gained widespread application across disciplines, last but not least in health sciences, since its inception in the 1980s. Bayesian analysis has been on the rise due to increased computational power over time, and Alari, Kim, and Wand have put Bland–Altman Limits of Agreement in a Bayesian framework (Meas.Phys.Educ.Exerc.Sci.2021,25,137–148). We contrasted the prediction of a single future observation and the estimation of the Limits of Agreement from the frequentist and a Bayesian perspective by analyzing interrater data of two sequentially conducted, preclinical studies. The estimation of the Limits of Agreement θ1 and θ2 has wider applicability than the prediction of single future differences. While a frequentist confidence interval represents a range of nonrejectable values for null hypothesis significance testing of H0: θ1 ≤ -δ or θ2 ≥ δ against H1: θ1 > -δ and θ2 < δ, with a predefined benchmark value δ, Bayesian analysis allows for direct interpretation of both the posterior probability of the alternative hypothesis and the likelihood of parameter values. We discuss group-sequential testing and nonparametric alternatives briefly. Frequentist simplicity does not beat Bayesian interpretability due to improved computational resources, but the elicitation and implementation of prior information demand caution. Accounting for clustered data (e.g., repeated measurements per subject) is well-established in frequentist, but not yet in Bayesian Bland–Altman analysis.


Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 1051-1068
Author(s):  
Andrei V. Zenkov

We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford’s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author’s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian.


Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 1027-1050
Author(s):  
Pushpa Narayan Rathie ◽  
Luan Carlos de Sena Monteiro Ozelim ◽  
Bernardo Borba de Andrade

Modern portfolio theory indicates that portfolio optimization can be carried out based on the mean-variance model, where returns and risk are represented as the average and variance of the historical data of the stock’s returns, respectively. Several studies have been carried out to find better risk proxies, as variance was not that accurate. On the other hand, fewer papers are devoted to better model/characterize returns. In the present paper, we explore the use of the reliability measure P(Y<X) to choose between portfolios with returns given by the distributions X and Y. Thus, instead of comparing the expected values of X and Y, we will explore the metric P(Y<X) as a proxy parameter for return. The dependence between such distributions shall be modelled by copulas. At first, we derive some general results which allows us to split the value of P(Y<X) as the sum of independent and dependent parts, in general, for copula-dependent assets. Then, to further develop our mathematical framework, we chose Frank copula to model the dependency between assets. In the process, we derive a new polynomial representation for Frank copulas. To perform a study case, we considered assets whose returns’ distributions follow Dagum distributions or their transformations. We carried out a parametric analysis, indicating the relative effect of the dependency of return distributions over the reliability index P(Y<X). Finally, we illustrate our methodology by performing a comparison between stock returns, which could be used to build portfolios based on the value of the the reliability index P(Y<X).


Sign in / Sign up

Export Citation Format

Share Document