scholarly journals Asymptotic for Lasso Estimator in High-Dimensional Repeated Measurements Model

2021 ◽  
Vol 1818 (1) ◽  
pp. 012039
Author(s):  
Naser Oda Jassim ◽  
Andul Hussein Saber Al-Mouel
2020 ◽  
Author(s):  
Naser Oda Jassim ◽  
Abdul Hussein Saber Al-Mouel

2014 ◽  
Vol 2014 ◽  
pp. 1-7
Author(s):  
Shiqing Wang ◽  
Yan Shi ◽  
Limin Su

Regularity conditions play a pivotal role for sparse recovery in high-dimensional regression. In this paper, we present a weaker regularity condition and further discuss the relationships with other regularity conditions, such as restricted eigenvalue condition. We study the behavior of our new condition for design matrices with independent random columns uniformly drawn on the unit sphere. Moreover, the present paper shows that, under a sparsity scenario, the Lasso estimator and Dantzig selector exhibit similar behavior. Based on both methods, we derive, in parallel, more precise bounds for the estimation loss and the prediction risk in the linear regression model when the number of variables can be much larger than the sample size.


Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 619-634 ◽  
Author(s):  
Ping-Shou Zhong ◽  
Runze Li ◽  
Shawn Santo

Summary This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for changepoint detection, and its asymptotic distribution is established. If a changepoint is detected, an estimate of the location is provided. The rate of convergence of the estimator is shown to depend on the data dimension, sample size, and signal-to-noise ratio. Binary segmentation is used to estimate the locations of possibly multiple changepoints, and the corresponding estimator is shown to be consistent under mild conditions. Simulation studies provide the empirical size and power of the proposed test and the accuracy of the changepoint estimator. An application to a time-course microarray dataset identifies gene sets with significant gene interaction changes over time.


2020 ◽  
pp. 096228022094608
Author(s):  
Louis Capitaine ◽  
Robin Genuer ◽  
Rodolphe Thiébaut

Random forests are one of the state-of-the-art supervised machine learning methods and achieve good performance in high-dimensional settings where p, the number of predictors, is much larger than n, the number of observations. Repeated measurements provide, in general, additional information, hence they are worth accounted especially when analyzing high-dimensional data. Tree-based methods have already been adapted to clustered and longitudinal data by using a semi-parametric mixed effects model, in which the non-parametric part is estimated using regression trees or random forests. We propose a general approach of random forests for high-dimensional longitudinal data. It includes a flexible stochastic model which allows the covariance structure to vary over time. Furthermore, we introduce a new method which takes intra-individual covariance into consideration to build random forests. Through simulation experiments, we then study the behavior of different estimation methods, especially in the context of high-dimensional data. Finally, the proposed method has been applied to an HIV vaccine trial including 17 HIV-infected patients with 10 repeated measurements of 20,000 gene transcripts and blood concentration of human immunodeficiency virus RNA. The approach selected 21 gene transcripts for which the association with HIV viral load was fully relevant and consistent with results observed during primary infection.


2017 ◽  
Vol 45 (3) ◽  
pp. 1185-1213 ◽  
Author(s):  
Ping-Shou Zhong ◽  
Wei Lan ◽  
Peter X. K. Song ◽  
Chih-Ling Tsai

Author(s):  
Jiayi Hou ◽  
Kellie J. Archer

AbstractAn ordinal scale is commonly used to measure health status and disease related outcomes in hospital settings as well as in translational medical research. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical methodology based on statistical inference, in particular, ordinal modeling has contributed to the analysis of data in which the response categories are ordered and the number of covariates (


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 230
Author(s):  
Fang Xie ◽  
Johannes Lederer

Recent discoveries suggest that our gut microbiome plays an important role in our health and wellbeing. However, the gut microbiome data are intricate; for example, the microbial diversity in the gut makes the data high-dimensional. While there are dedicated high-dimensional methods, such as the lasso estimator, they always come with the risk of false discoveries. Knockoffs are a recent approach to control the number of false discoveries. In this paper, we show that knockoffs can be aggregated to increase power while retaining sharp control over the false discoveries. We support our method both in theory and simulations, and we show that it can lead to new discoveries on microbiome data from the American Gut Project. In particular, our results indicate that several phyla that have been overlooked so far are associated with obesity.


2019 ◽  
Vol 13 ◽  
pp. 174830261986744
Author(s):  
Ran Zhang ◽  
Bin Ye ◽  
Peng Liu

Nowadays, datasets containing a very large number of variables or features are routinely generated in many fields. Dimension reduction techniques are usually performed prior to statistically analyzing these datasets in order to avoid the effects of the curse of dimensionality. Principal component analysis is one of the most important techniques for dimension reduction and data visualization. However, datasets with missing values arising in almost every field will produce biased estimates and are difficult to handle, especially in the high dimension, low sample size settings. By exploiting a Lasso estimator of the population covariance matrix, we propose to regularize the principal component analysis to reduce the dimensionality of dataset with missing data. The Lasso estimator of covariance matrix is computationally tractable by solving a convex optimization problem. To illustrate the effectiveness of our method on dimension reduction, the principal component directions are evaluated by the metrics of Frobenius norm and cosine distance. The performances are compared with other incomplete data handling methods such as mean substitution and multiple imputation. Simulation results also show that our method is superior to other incomplete data handling methods in the context of discriminant analysis of real world high-dimensional datasets.


Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 533-546 ◽  
Author(s):  
Guo Yu ◽  
Jacob Bien

Summary The lasso has been studied extensively as a tool for estimating the coefficient vector in the high-dimensional linear model; however, considerably less is known about estimating the error variance in this context. In this paper, we propose the natural lasso estimator for the error variance, which maximizes a penalized likelihood objective. A key aspect of the natural lasso is that the likelihood is expressed in terms of the natural parameterization of the multi-parameter exponential family of a Gaussian with unknown mean and variance. The result is a remarkably simple estimator of the error variance with provably good performance in terms of mean squared error. These theoretical results do not require placing any assumptions on the design matrix or the true regression coefficients. We also propose a companion estimator, called the organic lasso, which theoretically does not require tuning of the regularization parameter. Both estimators do well empirically compared to pre-existing methods, especially in settings where successful recovery of the true support of the coefficient vector is hard. Finally, we show that existing methods can do well under fewer assumptions than previously known, thus providing a fuller story about the problem of estimating the error variance in high-dimensional linear models.


Sign in / Sign up

Export Citation Format

Share Document