scholarly journals Homogeneity tests of covariance matrices with high-dimensional longitudinal data

Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 619-634 ◽  
Author(s):  
Ping-Shou Zhong ◽  
Runze Li ◽  
Shawn Santo

Summary This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for changepoint detection, and its asymptotic distribution is established. If a changepoint is detected, an estimate of the location is provided. The rate of convergence of the estimator is shown to depend on the data dimension, sample size, and signal-to-noise ratio. Binary segmentation is used to estimate the locations of possibly multiple changepoints, and the corresponding estimator is shown to be consistent under mild conditions. Simulation studies provide the empirical size and power of the proposed test and the accuracy of the changepoint estimator. An application to a time-course microarray dataset identifies gene sets with significant gene interaction changes over time.

2019 ◽  
Author(s):  
Eric Klopp ◽  
Stefan Klößner

In this contribution, we investigate the effects of manifest residual variance, indicator communality and sample size on the χ2-test statistic of the metric measurement invariance model, i.e. the model with equality constraints on all loadings. We demonstrate by means of Monte Carlo studies that the χ2-test statistic relates inversely to manifest residual variance, whereas sample size and χ2-test statistic show the well-known pro- portional relation. Moreover, we consider indicator communality as a key factor for the size of the χ2-test statistic. In this context, we introduce the concept of signal-to-noise ratio as a tool for studying the effects of manifest residual error and indicator commu- nality and demonstrate its use with some examples. Finally, we discuss the limitations of this contribution and its practical implication for the analysis of metric measurement invariance models.


2020 ◽  
pp. 096228022094608
Author(s):  
Louis Capitaine ◽  
Robin Genuer ◽  
Rodolphe Thiébaut

Random forests are one of the state-of-the-art supervised machine learning methods and achieve good performance in high-dimensional settings where p, the number of predictors, is much larger than n, the number of observations. Repeated measurements provide, in general, additional information, hence they are worth accounted especially when analyzing high-dimensional data. Tree-based methods have already been adapted to clustered and longitudinal data by using a semi-parametric mixed effects model, in which the non-parametric part is estimated using regression trees or random forests. We propose a general approach of random forests for high-dimensional longitudinal data. It includes a flexible stochastic model which allows the covariance structure to vary over time. Furthermore, we introduce a new method which takes intra-individual covariance into consideration to build random forests. Through simulation experiments, we then study the behavior of different estimation methods, especially in the context of high-dimensional data. Finally, the proposed method has been applied to an HIV vaccine trial including 17 HIV-infected patients with 10 repeated measurements of 20,000 gene transcripts and blood concentration of human immunodeficiency virus RNA. The approach selected 21 gene transcripts for which the association with HIV viral load was fully relevant and consistent with results observed during primary infection.


2017 ◽  
Vol 1 (2) ◽  
pp. 118
Author(s):  
Knavoot Jiamwattanapong ◽  
Samruam Chongcharoen

<p><em>Modern measurement technology has enabled the capture of high-dimensional data by researchers and statisticians and classical statistical inferences, such as </em><em>the renowned Hotelling’s T<sup>2</sup> test, are no longer valid when the dimension of the data equals or exceeds the sample size. Importantly, when correlations among variables in a dataset exist, taking them into account in the analysis method would provide more accurate conclusions. In this article, we consider the hypothesis testing problem for two mean vectors in high-dimensional data with an underlying normality assumption. A new test is proposed based on the idea of keeping more information from the sample covariances. The asymptotic null distribution of the test statistic is derived. The simulation results show that the proposed test performs well comparing with other competing tests and becomes more powerful when the dimension increases for a given sample size. The proposed test is also illustrated with an analysis of DNA microarray data. </em></p>


2019 ◽  
Vol 09 (03) ◽  
pp. 2050009
Author(s):  
Jing Chen ◽  
Xiaoyi Wang ◽  
Shurong Zheng ◽  
Baisen Liu ◽  
Ning-Zhong Shi

In this paper, we propose some new tests for high-dimensional covariance matrices that are applicable to generally distributed populations with finite fourth moments. The proposed test statistics are the maximum of the likelihood ratio test statistic and the statistic based on the Frobenius norm. The advantage of the new tests is the good performance in terms of power for both the traditional case, in which the dimension is much smaller than the sample size, and the high-dimensional case, in which the dimension is large compared to the sample size. In the one-sample case, the new test is proposed for testing the hypothesis that the high-dimensional covariance matrix equals an identity matrix. In the two-sample case, the new test is developed for testing the equality of two high-dimensional covariance matrices. By using the random matrix theory, the asymptotic distributions of the proposed new tests are derived under the assumption that the dimension and the sample size proportionally tend toward infinity. Finally, numerical studies are conducted to investigate the finite sample performance of the proposed new tests.


2020 ◽  
Author(s):  
Eric Klopp ◽  
Stefan Klößner

We investigate the effects of manifest residual variances, indicator communalities, and sample size on the χ2-test statistic of the metric measurement invariance model when the model is misspecified, i.e., there is at least one population loading that violates metric measurement invariance. First, we demonstrate the choice of the scaling method does not affect the model’s χ2-test statistic. Afterward, we demonstrate that the χ2-test statistic relates inversely to manifest residual variances, whereas sample size and χ2-test statistic show a positive relation. Moreover, we consider indicator communality as a key factor for the size of the χ2-test statistic. In this context, we introduce the concept of signal-to-noise ratio as a tool for studying the effects of manifest residual variance and indicator communality and demonstrate its use with the example. Finally, we discuss the limitations and the practical implications for the analysis of metric measurement invariance models.


Author(s):  
Karin Biering ◽  
Morten Frydenberg ◽  
Helle Pappot ◽  
Niels Henrik Hjollund

Abstract Purpose Fatigue following breast cancer is a well-known problem, with both high and persistent prevalence. Previous studies suffer from lack of repeated measurements, late recruitment and short periods of follow-up. The course of fatigue from diagnosis and treatment to the long-time outcome status is unknown as well as differences in the level of fatigue between treatment regimens. The purpose of this study was to describe the long-time course of fatigue from the time of clinical suspicion of breast cancer, its dependence of patient characteristics and treatment regimens and the comparison with the course of fatigue among women with the same suspicion, but not diagnosed with breast cancer. Methods Three hundred thirty-two women referred to acute or subacute mammography was followed with questionnaires from before the mammography and up to 1500 days. Fatigue was measured by the Multidimensional Fatigue Inventory (MFI-20). The women reported their initial level of fatigue before the mammography and thus without knowledge of whether they had cancer or not. Both women with and without cancer were followed. Women with cancer were identified in the clinical database established by Danish Breast Cancer Cooperative Group (DBCG) to collect information on treatment regimen. Results Compared to fatigue scores before diagnosis, women with breast cancer reported a large increase of fatigue, especially in the first 6 months, followed by a slow decrease over time. Despite the long follow-up period, the women with breast cancer did not return to their level of fatigue at time of the mammography. Women without breast cancer, experienced a rapid decrease of fatigue after disproval of diagnosis followed by a steadier period. Conclusions Fatigue is a persistent problem in women diagnosed with breast cancer, even several years following diagnosis and treatment. The women with breast cancer were most affected by fatigue in the first 6 months after diagnosis.


Author(s):  
Markus Ekvall ◽  
Michael Höhle ◽  
Lukas Käll

Abstract Motivation Permutation tests offer a straightforward framework to assess the significance of differences in sample statistics. A significant advantage of permutation tests are the relatively few assumptions about the distribution of the test statistic are needed, as they rely on the assumption of exchangeability of the group labels. They have great value, as they allow a sensitivity analysis to determine the extent to which the assumed broad sample distribution of the test statistic applies. However, in this situation, permutation tests are rarely applied because the running time of naïve implementations is too slow and grows exponentially with the sample size. Nevertheless, continued development in the 1980s introduced dynamic programming algorithms that compute exact permutation tests in polynomial time. Albeit this significant running time reduction, the exact test has not yet become one of the predominant statistical tests for medium sample size. Here, we propose a computational parallelization of one such dynamic programming-based permutation test, the Green algorithm, which makes the permutation test more attractive. Results Parallelization of the Green algorithm was found possible by non-trivial rearrangement of the structure of the algorithm. A speed-up—by orders of magnitude—is achievable by executing the parallelized algorithm on a GPU. We demonstrate that the execution time essentially becomes a non-issue for sample sizes, even as high as hundreds of samples. This improvement makes our method an attractive alternative to, e.g. the widely used asymptotic Mann-Whitney U-test. Availabilityand implementation In Python 3 code from the GitHub repository https://github.com/statisticalbiotechnology/parallelPermutationTest under an Apache 2.0 license. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 44 (3) ◽  
pp. 309-319 ◽  
Author(s):  
Joshua S. Jackman ◽  
Phillip G. Bell ◽  
Simone Gill ◽  
Ken van Someren ◽  
Gareth W. Davison ◽  
...  

A variety of strategies exist to modulate the acute physiological responses following resistance exercise aimed at enhancing recovery and/or adaptation processes. To assess the true impact of these strategies, it is important to know the ability of different measures to detect meaningful change. We investigated the sensitivity of measures used to quantify acute physiological responses to resistance exercise and constructed a physiological profile to characterise the magnitude of change and the time course of these responses. Eight males accustomed to regular resistance exercise performed experimental sessions during a “control week”, void of an exercise stimulus. The following week, termed the “exercise week”, participants repeated this sequence of experimental sessions, and they also performed a bout of lower-limb resistance exercise following the baseline assessments. Assessments were conducted at baseline and at 2, 6, 24, 48, 72, and 96 h after the intervention. On the basis of the signal-to-noise ratio, the most sensitive measures were maximal voluntary isometric contraction, 20-m sprint, countermovement jump peak force, rate of force development (100–200 ms), muscle soreness, Daily Analysis Of Life Demands For Athletes part B, limb girth, matrix metalloproteinase-9, interleukin-6, creatine kinase, and high-sensitivity C-reactive protein with ratios >1.5. Clear changes in these measures following resistance exercise were determined via magnitude-based inferences. These findings highlight measures that can detect real changes in acute physiological responses following resistance exercise in trained individuals. Researchers investigating strategies to manipulate acute physiological responses for recovery and/or adaptation can use these measures, as well as the recommended sampling points, to be confident that their interventions are making a worthwhile impact.


Biometrika ◽  
2020 ◽  
Author(s):  
Oliver Dukes ◽  
Stijn Vansteelandt

Summary Eliminating the effect of confounding in observational studies typically involves fitting a model for an outcome adjusted for covariates. When, as often, these covariates are high-dimensional, this necessitates the use of sparse estimators, such as the lasso, or other regularization approaches. Naïve use of such estimators yields confidence intervals for the conditional treatment effect parameter that are not uniformly valid. Moreover, as the number of covariates grows with the sample size, correctly specifying a model for the outcome is nontrivial. In this article we deal with both of these concerns simultaneously, obtaining confidence intervals for conditional treatment effects that are uniformly valid, regardless of whether the outcome model is correct. This is done by incorporating an additional model for the treatment selection mechanism. When both models are correctly specified, we can weaken the standard conditions on model sparsity. Our procedure extends to multivariate treatment effect parameters and complex longitudinal settings.


Sign in / Sign up

Export Citation Format

Share Document