Specification tests for covariance structures in high-dimensional statistical models

Biometrika ◽  
2020 ◽  
Author(s):  
X Guo ◽  
C Y Tang

Summary We consider testing the covariance structure in statistical models. We focus on developing such tests when the random vectors of interest are not directly observable and have to be derived via estimated models. Additionally, the covariance specification may involve extra nuisance parameters which also need to be estimated. In a generic additive model setting, we develop and investigate test statistics based on the maximum discrepancy measure calculated from the residuals. To approximate the distributions of the test statistics under the null hypothesis, new multiplier bootstrap procedures with dedicated adjustments that incorporate the model and nuisance parameter estimation errors are proposed. Our theoretical development elucidates the impact due to the estimation errors with high-dimensional data and demonstrates the validity of our tests. Simulations and real data examples confirm our theory and demonstrate the performance of the proposed tests.

Author(s):  
Homayun Afrabandpey ◽  
Tomi Peltola ◽  
Samuel Kaski

Learning predictive models from small high-dimensional data sets is a key problem in high-dimensional statistics. Expert knowledge elicitation can help, and a strong line of work focuses on directly eliciting informative prior distributions for parameters. This either requires considerable statistical expertise or is laborious, as the emphasis has been on accuracy and not on efficiency of the process. Another line of work queries about importance of features one at a time, assuming them to be independent and hence missing covariance information. In contrast, we propose eliciting expert knowledge about pairwise feature similarities, to borrow statistical strength in the predictions, and using sequential decision making techniques to minimize the effort of the expert. Empirical results demonstrate improvement in predictive performance on both simulated and real data, in high-dimensional linear regression tasks, where we learn the covariance structure with a Gaussian process, based on sequential elicitation.


Biometrika ◽  
2020 ◽  
Author(s):  
W Van Den Boom ◽  
G Reeves ◽  
D B Dunson

Abstract Posterior computation for high-dimensional data with many parameters can be challenging. This article focuses on a new method for approximating posterior distributions of a low- to moderate-dimensional parameter in the presence of a high-dimensional or otherwise computationally challenging nuisance parameter. The focus is on regression models and the key idea is to separate the likelihood into two components through a rotation. One component involves only the nuisance parameters, which can then be integrated out using a novel type of Gaussian approximation. We provide theory on approximation accuracy that holds for a broad class of forms of the nuisance component and priors. Applying our method to simulated and real data sets shows that it can outperform state-of-the-art posterior approximation approaches.


2016 ◽  
Vol 4 (1) ◽  
Author(s):  
Xiangzhao Cui ◽  
Chun Li ◽  
Jine Zhao ◽  
Li Zeng ◽  
Defei Zhang ◽  
...  

AbstractIn many applications, high-dimensional problem may occur often for various reasons, for example, when the number of variables under consideration is much bigger than the sample size, i.e., p >> n. For highdimensional data, the underlying structures of certain covariance matrix estimates are usually blurred due to substantial random noises, which is an obstacle to draw statistical inferences. In this paper, we propose a method to identify the underlying covariance structure by regularizing a given/estimated covariance matrix so that the noises can be filtered. By choosing an optimal structure from a class of candidate structures for the covariance matrix, the regularization is made in terms of minimizing Frobenius-norm discrepancy. The candidate class considered here includes the structures of order-1 moving average, compound symmetry, order-1 autoregressive and order-1 autoregressive moving average. Very intensive simulation studies are conducted to assess the performance of the proposed regularization method for very high-dimensional covariance problem. The simulation studies also show that the sample covariance matrix, although performs very badly in covariance estimation for high-dimensional data, can be used to correctly identify the underlying structure of the covariance matrix. The approach is also applied to real data analysis, which shows that the proposed regularization method works well in practice.


2020 ◽  
Author(s):  
Eduardo Atem De Carvalho ◽  
Rogerio Atem De Carvalho

BACKGROUND Since the beginning of the COVID-19 pandemic, researchers and health authorities have sought to identify the different parameters that govern their infection and death cycles, in order to be able to make better decisions. In particular, a series of reproduction number estimation models have been presented, with different practical results. OBJECTIVE This article aims to present an effective and efficient model for estimating the Reproduction Number and to discuss the impacts of sub-notification on these calculations. METHODS The concept of Moving Average Method with Initial value (MAMI) is used, as well as a model for Rt, the Reproduction Number, is derived from experimental data. The models are applied to real data and their performance is presented. RESULTS Analyses on Rt and sub-notification effects for Germany, Italy, Sweden, United Kingdom, South Korea, and the State of New York are presented to show the performance of the methods here introduced. CONCLUSIONS We show that, with relatively simple mathematical tools, it is possible to obtain reliable values for time-dependent, incubation period-independent Reproduction Numbers (Rt). We also demonstrate that the impact of sub-notification is relatively low, after the initial phase of the epidemic cycle has passed.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Hanji He ◽  
Guangming Deng

We extend the mean empirical likelihood inference for response mean with data missing at random. The empirical likelihood ratio confidence regions are poor when the response is missing at random, especially when the covariate is high-dimensional and the sample size is small. Hence, we develop three bias-corrected mean empirical likelihood approaches to obtain efficient inference for response mean. As to three bias-corrected estimating equations, we get a new set by producing a pairwise-mean dataset. The method can increase the size of the sample for estimation and reduce the impact of the dimensional curse. Consistency and asymptotic normality of the maximum mean empirical likelihood estimators are established. The finite sample performance of the proposed estimators is presented through simulation, and an application to the Boston Housing dataset is shown.


Econometrics ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 10
Author(s):  
Šárka Hudecová ◽  
Marie Hušková ◽  
Simos G. Meintanis

This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.


2020 ◽  
Vol 499 (3) ◽  
pp. 4054-4067
Author(s):  
Steven Cunnington ◽  
Stefano Camera ◽  
Alkistis Pourtsidou

ABSTRACT Potential evidence for primordial non-Gaussianity (PNG) is expected to lie in the largest scales mapped by cosmological surveys. Forthcoming 21 cm intensity mapping experiments will aim to probe these scales by surveying neutral hydrogen (H i) within galaxies. However, foreground signals dominate the 21 cm emission, meaning foreground cleaning is required to recover the cosmological signal. The effect this has is to damp the H i power spectrum on the largest scales, especially along the line of sight. Whilst there is agreement that this contamination is potentially problematic for probing PNG, it is yet to be fully explored and quantified. In this work, we carry out the first forecasts on fNL that incorporate simulated foreground maps that are removed using techniques employed in real data. Using an Monte Carlo Markov Chain analysis on an SKA1-MID-like survey, we demonstrate that foreground cleaned data recovers biased values [$f_{\rm NL}= -102.1_{-7.96}^{+8.39}$ (68 per cent CL)] on our fNL = 0 fiducial input. Introducing a model with fixed parameters for the foreground contamination allows us to recover unbiased results ($f_{\rm NL}= -2.94_{-11.9}^{+11.4}$). However, it is not clear that we will have sufficient understanding of foreground contamination to allow for such rigid models. Treating the main parameter $k_\parallel ^\text{FG}$ in our foreground model as a nuisance parameter and marginalizing over it, still recovers unbiased results but at the expense of larger errors ($f_{\rm NL}= 0.75^{+40.2}_{-44.5}$), which can only be reduced by imposing the Planck 2018 prior. Our results show that significant progress on understanding and controlling foreground removal effects is necessary for studying PNG with H i intensity mapping.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1250
Author(s):  
Daniel Medina ◽  
Haoqing Li ◽  
Jordi Vilà-Valls ◽  
Pau Closas

Global navigation satellite systems (GNSSs) play a key role in intelligent transportation systems such as autonomous driving or unmanned systems navigation. In such applications, it is fundamental to ensure a reliable precise positioning solution able to operate in harsh propagation conditions such as urban environments and under multipath and other disturbances. Exploiting carrier phase observations allows for precise positioning solutions at the complexity cost of resolving integer phase ambiguities, a procedure that is particularly affected by non-nominal conditions. This limits the applicability of conventional filtering techniques in challenging scenarios, and new robust solutions must be accounted for. This contribution deals with real-time kinematic (RTK) positioning and the design of robust filtering solutions for the associated mixed integer- and real-valued estimation problem. Families of Kalman filter (KF) approaches based on robust statistics and variational inference are explored, such as the generalized M-based KF or the variational-based KF, aiming to mitigate the impact of outliers or non-nominal measurement behaviors. The performance assessment under harsh propagation conditions is realized using a simulated scenario and real data from a measurement campaign. The proposed robust filtering solutions are shown to offer excellent resilience against outlying observations, with the variational-based KF showcasing the overall best performance in terms of Gaussian efficiency and robustness.


2021 ◽  
Vol 10 (s1) ◽  
Author(s):  
Said Gounane ◽  
Yassir Barkouch ◽  
Abdelghafour Atlas ◽  
Mostafa Bendahmane ◽  
Fahd Karami ◽  
...  

Abstract Recently, various mathematical models have been proposed to model COVID-19 outbreak. These models are an effective tool to study the mechanisms of coronavirus spreading and to predict the future course of COVID-19 disease. They are also used to evaluate strategies to control this pandemic. Generally, SIR compartmental models are appropriate for understanding and predicting the dynamics of infectious diseases like COVID-19. The classical SIR model is initially introduced by Kermack and McKendrick (cf. (Anderson, R. M. 1991. “Discussion: the Kermack–McKendrick Epidemic Threshold Theorem.” Bulletin of Mathematical Biology 53 (1): 3–32; Kermack, W. O., and A. G. McKendrick. 1927. “A Contribution to the Mathematical Theory of Epidemics.” Proceedings of the Royal Society 115 (772): 700–21)) to describe the evolution of the susceptible, infected and recovered compartment. Focused on the impact of public policies designed to contain this pandemic, we develop a new nonlinear SIR epidemic problem modeling the spreading of coronavirus under the effect of a social distancing induced by the government measures to stop coronavirus spreading. To find the parameters adopted for each country (for e.g. Germany, Spain, Italy, France, Algeria and Morocco) we fit the proposed model with respect to the actual real data. We also evaluate the government measures in each country with respect to the evolution of the pandemic. Our numerical simulations can be used to provide an effective tool for predicting the spread of the disease.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sairish Ashraf ◽  
Shayaq Ul Abeer Rasool ◽  
Mudasar Nabi ◽  
Mohd Ashraf Ganie ◽  
Shariq R. Masoodi ◽  
...  

AbstractPolycystic ovary syndrome (PCOS) is the most common reproductive endocrine disorder in pre-menopausal women having complex pathophysiology. Several candidate genes have been shown to have association with PCOS. CYP19 gene encodes a key steroidogenic enzyme involved in conversion of androgens into estrogens. Previous studies have reported contradictory results with regard to association of SNP rs2414096 in CYP19 gene with PCOS and hyperandrogenism in different ethnic populations. Present study was aimed to investigate the impact of SNP rs2414096 polymorphism of CYP19 gene on susceptibility of PCOS and hyperandrogenism in Kashmiri women. Further we also studied the genotypic-phenotypic association for various clinical and biochemical parameters of this polymorphism. Case control study. 394 PCOS cases diagnosed on the basis of Rotterdam criteria and age matched 306 healthy women. We found a significant differences in genotypic frequency (χ2 = 18.91, p < 0.05) as well as allele frequency (OR 0.63, CI 0.51–0.78, χ2 = 17.66, p < 0.05) between PCOS women and controls. The genotype–phenotype correlation analysis showed a significant difference in FG score (p = 0.047) and alopecia (p = 0.045) between the three genotypes. Also, the androgen excess markers like DHEAS (p < 0.001), Androstenedione (p < 0.001), Testosterone (p < 0.001) and FAI (p = 0.005) were significantly elevated in GG genotype and showed a significant difference in additive model in PCOS women. rs2414096 polymorphism of CYP19 gene is associated with the risk of PCOS as well as with clinical and biochemical markers of hyperandrogenism, hence suggesting its role in clinical manifestations of PCOS in Kashmiri women.


Sign in / Sign up

Export Citation Format

Share Document