Open Statistics
Latest Publications


TOTAL DOCUMENTS

6
(FIVE YEARS 6)

H-INDEX

1
(FIVE YEARS 1)

Published By Walter De Gruyter Gmbh

2657-3601

2021 ◽  
Vol 1 (1) ◽  
pp. 49-58
Author(s):  
Mårten Schultzberg ◽  
Per Johansson

AbstractRecently a computational-based experimental design strategy called rerandomization has been proposed as an alternative or complement to traditional blocked designs. The idea of rerandomization is to remove, from consideration, those allocations with large imbalances in observed covariates according to a balance criterion, and then randomize within the set of acceptable allocations. Based on the Mahalanobis distance criterion for balancing the covariates, we show that asymptotic inference to the population, from which the units in the sample are randomly drawn, is possible using only the set of best, or ‘optimal’, allocations. Finally, we show that for the optimal and near optimal designs, the quite complex asymptotic sampling distribution derived by Li et al. (2018), is well approximated by a normal distribution.


2021 ◽  
Vol 2 (1) ◽  
pp. 81-112
Author(s):  
Taeho Kim ◽  
Benjamin Lieberman ◽  
George Luta ◽  
Edsel A. Peña

Abstract Motivated by the Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting the number of daily deaths and the number of cumulative deaths, this paper examines the construction of prediction regions or intervals under the no-covariate or intercept-only Poisson model, the Poisson regression model, and a new over-dispersed Poisson regression model. These models are useful for settings with events of interest that are rare. For the no-covariate Poisson and the Poisson regression model, several prediction regions are developed and their performances are compared through simulation studies. The methods are applied to the problem of forecasting the number of daily deaths and the number of cumulative deaths in the United States (US) due to COVID-19. To examine their predictive accuracy in light of what actually happened, daily deaths data until May 15, 2020 were used to forecast cumulative deaths by June 1, 2020. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. A novel over-dispersed Poisson regression model is therefore proposed. This new model, which is distinct from the negative binomial regression (NBR) model, builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. It has the flavor of a discrete measurement error model and with a viable physical interpretation in contrast to the NBR model. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model, obtained as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by October 1, 2020, given the data until September 1, 2020, is presented. Realized daily and cumulative deaths values from September 1st until September 25th are compared to the prediction region limits. Finally, the paper discusses limitations of the proposed procedures and mentions open research problems. It also pinpoints dangers and pitfalls when forecasting on a long horizon, especially during a pandemic where events, both foreseen and unforeseen, could impact point predictions and prediction regions.


2021 ◽  
Vol 2 (1) ◽  
pp. 24-36
Author(s):  
Stan Lipovetsky

Abstract Complex managerial problems are usually described by datasets with multiple variables, and in lack of a theoretical model, the data structures can be found by special multivariate statistical techniques. For two datasets, the canonical correlation analysis and its robust version are known as good working research tools. This paper presents their further development via the orthonormal approximation of data matrices which corresponds to using singular value decomposition in the canonical correlations. The features of the new method are described and applications considered. This type of multivariate analysis is useful for solving various practical problems of applied statistics requiring operating with two data sets, and can be helpful in managerial estimations and decision making.


2021 ◽  
Vol 2 (1) ◽  
pp. 37-80
Author(s):  
Hiroyuki Kawakatsu

Abstract This paper proposes the use of a spliced distribution with generalized Pareto tail for financial risk management. The proposed distribution is tailored to flexibly capture the heavy tail in asset return distribution. The parameters of the distribution can be estimated jointly with a conditional heteroskedasticity model. The estimated parameters can then be used to produce tail risk forecasts for risk management purposes. The use of the proposed distribution is illustrated by evaluating tail risk forecasts for a number of major stock indices.


2021 ◽  
Vol 2 (1) ◽  
pp. 1-23
Author(s):  
Christophe Chesneau

Abstract In probability and statistics, unit distributions are used to model proportions, rates, and percentages, among other things. This paper is about a new one-parameter unit distribution, whose probability density function is defined by an original ratio of power and logarithmic functions. This function has a wide range of J shapes, some of which are more angular than others. In this sense, the proposed distribution can be thought of as an “extremely left skewed alternative” to the traditional power distribution. We discuss its main characteristics, including other features of the probability density function, some stochastic order results, the closed-form expression of the cumulative distribution function involving special integral functions, the quantile and hazard rate functions, simple expressions for the ordinary moments, skewness, kurtosis, moments generating function, incomplete moments, logarithmic moments and logarithmically weighted moments. Subsequently, a simple example of an application is given by the use of simulated data, with fair comparison to the power model supported by numerical and graphical illustrations. A new modelling strategy beyond the unit domain is also proposed and developed, with an application to a survival times data set.


2019 ◽  
Vol 1 (1) ◽  
pp. 1-48 ◽  
Author(s):  
Björn Böttcher

AbstractDistance multivariance is a multivariate dependence measure, which can detect dependencies between an arbitrary number of random vectors each of which can have a distinct dimension. Here we discuss several new aspects, present a concise overview and use it as the basis for several new results and concepts: in particular, we show that distance multivariance unifies (and extends) distance covariance and the Hilbert-Schmidt independence criterion HSIC, moreover also the classical linear dependence measures: covariance, Pearson’s correlation and the RV coefficient appear as limiting cases. Based on distance multivariance several new measures are defined: a multicorrelation which satisfies a natural set of multivariate dependence measure axioms and m-multivariance which is a dependence measure yielding tests for pairwise independence and independence of higher order. These tests are computationally feasible and under very mild moment conditions they are consistent against all alternatives. Moreover, a general visualization scheme for higher order dependencies is proposed, including consistent estimators (based on distance multivariance) for the dependence structure.Many illustrative examples are provided. All functions for the use of distance multivariance in applications are published in the R-package multivariance.


Sign in / Sign up

Export Citation Format

Share Document