scholarly journals Multinomial goodness-of-fit based on U-statistics: High-dimensional asymptotic and minimax optimality

2020 ◽  
Vol 205 ◽  
pp. 74-91
Author(s):  
Ilmun Kim
Author(s):  
Hongyi Xu ◽  
Zhen Jiang ◽  
Daniel W. Apley ◽  
Wei Chen

Data-driven random process models have become increasingly important for uncertainty quantification (UQ) in science and engineering applications, due to their merit of capturing both the marginal distributions and the correlations of high-dimensional responses. However, the choice of a random process model is neither unique nor straightforward. To quantitatively validate the accuracy of random process UQ models, new metrics are needed to measure their capability in capturing the statistical information of high-dimensional data collected from simulations or experimental tests. In this work, two goodness-of-fit (GOF) metrics, namely, a statistical moment-based metric (SMM) and an M-margin U-pooling metric (MUPM), are proposed for comparing different stochastic models, taking into account their capabilities of capturing the marginal distributions and the correlations in spatial/temporal domains. This work demonstrates the effectiveness of the two proposed metrics by comparing the accuracies of four random process models (Gaussian process (GP), Gaussian copula, Hermite polynomial chaos expansion (PCE), and Karhunen–Loeve (K–L) expansion) in multiple numerical examples and an engineering example of stochastic analysis of microstructural materials properties. In addition to the new metrics, this paper provides insights into the pros and cons of various data-driven random process models in UQ.


Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 716-723
Author(s):  
Mengyu Xu ◽  
Danna Zhang ◽  
Wei Biao Wu

Summary We establish an approximation theory for Pearson’s chi-squared statistics in situations where the number of cells is large, by using a high-dimensional central limit theorem for quadratic forms of random vectors. Our high-dimensional central limit theorem is proved under Lyapunov-type conditions that involve a delicate interplay between the dimension, the sample size, and the moment conditions. We propose a modified chi-squared statistic and introduce an adjusted degrees of freedom. A simulation study shows that the modified statistic outperforms Pearson’s chi-squared statistic in terms of both size accuracy and power. Our procedure is applied to the construction of a goodness-of-fit test for Rutherford’s alpha-particle data.


2012 ◽  
Vol 108 (7) ◽  
pp. 2069-2081 ◽  
Author(s):  
Sungho Hong ◽  
Quinten Robberechts ◽  
Erik De Schutter

The phase-response curve (PRC), relating the phase shift of an oscillator to external perturbation, is an important tool to study neurons and their population behavior. It can be experimentally estimated by measuring the phase changes caused by probe stimuli. These stimuli, usually short pulses or continuous noise, have a much wider frequency spectrum than that of neuronal dynamics. This makes the experimental data high dimensional while the number of data samples tends to be small. Current PRC estimation methods have not been optimized for efficiently discovering the relevant degrees of freedom from such data. We propose a systematic and efficient approach based on a recently developed signal processing theory called compressive sensing (CS). CS is a framework for recovering sparsely constructed signals from undersampled data and is suitable for extracting information about the PRC from finite but high-dimensional experimental measurements. We illustrate how the CS algorithm can be translated into an estimation scheme and demonstrate that our CS method can produce good estimates of the PRCs with simulated and experimental data, especially when the data size is so small that simple approaches such as naive averaging fail. The tradeoffs between degrees of freedom vs. goodness-of-fit were systematically analyzed, which help us to understand better what part of the data has the most predictive power. Our results illustrate that finite sizes of neuroscientific data in general compounded by large dimensionality can hamper studies of the neural code and suggest that CS is a good tool for overcoming this challenge.


2016 ◽  
Vol 37 (1) ◽  
Author(s):  
Gintautas Jakimauskas ◽  
Marijus Radavičius ◽  
Jurgis Sušinskas

A simple, data-driven and computationally efficient procedure for testing independence of high-dimensional random vectors is proposed. The procedure is based on interpretation of testing goodness-of-fit as the classification problem, a special sequential partition procedure, elements of sequential testing, resampling and randomization. Monte Carlo simulations are carried out to assess the performance of the procedure.


2019 ◽  
Vol 23 ◽  
pp. 662-671
Author(s):  
Matthias Löffler

In this study, we consider PCA for Gaussian observations X1, …, Xn with covariance Σ = ∑iλiPi in the ’effective rank’ setting with model complexity governed by r(Σ) ≔ tr(Σ)∕∥Σ∥. We prove a Berry-Essen type bound for a Wald Statistic of the spectral projector $\hat P_r$. This can be used to construct non-asymptotic goodness of fit tests and confidence ellipsoids for spectral projectors Pr. Using higher order pertubation theory we are able to show that our Theorem remains valid even when $\mathbf{r}(\Sigma) \gg \sqrt{n}$.


2019 ◽  
Vol 31 (9) ◽  
pp. 1751-1788 ◽  
Author(s):  
Ali Yousefi ◽  
Ishita Basu ◽  
Angelique C. Paulk ◽  
Noam Peled ◽  
Emad N. Eskandar ◽  
...  

Cognitive processes, such as learning and cognitive flexibility, are both difficult to measure and to sample continuously using objective tools because cognitive processes arise from distributed, high-dimensional neural activity. For both research and clinical applications, that dimensionality must be reduced. To reduce dimensionality and measure underlying cognitive processes, we propose a modeling framework in which a cognitive process is defined as a low-dimensional dynamical latent variable—called a cognitive state, which links high-dimensional neural recordings and multidimensional behavioral readouts. This framework allows us to decompose the hard problem of modeling the relationship between neural and behavioral data into separable encoding-decoding approaches. We first use a state-space modeling framework, the behavioral decoder, to articulate the relationship between an objective behavioral readout (e.g., response times) and cognitive state. The second step, the neural encoder, involves using a generalized linear model (GLM) to identify the relationship between the cognitive state and neural signals, such as local field potential (LFP). We then use the neural encoder model and a Bayesian filter to estimate cognitive state using neural data (LFP power) to generate the neural decoder. We provide goodness-of-fit analysis and model selection criteria in support of the encoding-decoding result. We apply this framework to estimate an underlying cognitive state from neural data in human participants ([Formula: see text]) performing a cognitive conflict task. We successfully estimated the cognitive state within the 95% confidence intervals of that estimated using behavior readout for an average of 90% of task trials across participants. In contrast to previous encoder-decoder models, our proposed modeling framework incorporates LFP spectral power to encode and decode a cognitive state. The framework allowed us to capture the temporal evolution of the underlying cognitive processes, which could be key to the development of closed-loop experiments and treatments.


2021 ◽  
Vol 49 (1) ◽  
pp. 154-181 ◽  
Author(s):  
Yinqiu He ◽  
Gongjun Xu ◽  
Chong Wu ◽  
Wei Pan

Sign in / Sign up

Export Citation Format

Share Document