scholarly journals Estimating the error variance in a high-dimensional linear model

Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 533-546 ◽  
Author(s):  
Guo Yu ◽  
Jacob Bien

Summary The lasso has been studied extensively as a tool for estimating the coefficient vector in the high-dimensional linear model; however, considerably less is known about estimating the error variance in this context. In this paper, we propose the natural lasso estimator for the error variance, which maximizes a penalized likelihood objective. A key aspect of the natural lasso is that the likelihood is expressed in terms of the natural parameterization of the multi-parameter exponential family of a Gaussian with unknown mean and variance. The result is a remarkably simple estimator of the error variance with provably good performance in terms of mean squared error. These theoretical results do not require placing any assumptions on the design matrix or the true regression coefficients. We also propose a companion estimator, called the organic lasso, which theoretically does not require tuning of the regularization parameter. Both estimators do well empirically compared to pre-existing methods, especially in settings where successful recovery of the true support of the coefficient vector is hard. Finally, we show that existing methods can do well under fewer assumptions than previously known, thus providing a fuller story about the problem of estimating the error variance in high-dimensional linear models.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 512
Author(s):  
Charlotte Soneson ◽  
Federico Marini ◽  
Florian Geier ◽  
Michael I. Love ◽  
Michael B. Stadler

Linear and generalized linear models are used extensively in many scientific fields, to model observed data and as the basis for hypothesis tests. The use of such models requires specification of a design matrix, and subsequent formulation of contrasts representing scientific hypotheses of interest. Proper execution of these steps requires a thorough understanding of the meaning of the individual coefficients, and is a frequent source of uncertainty for end users. Here, we present an R/Bioconductor package, ExploreModelMatrix, which enables interactive exploration of design matrices and linear model diagnostics. Given a sample annotation table and a desired design formula, the package displays how the model coefficients are combined to give the fitted values for each combination of predictor variables, which allows users to both extract the interpretation of each individual coefficient, and formulate desired linear contrasts. In addition, the interactive interface displays informative characteristics for the regular linear model corresponding to the provided design, such as variance inflation factors and the pseudoinverse of the design matrix.


2005 ◽  
Vol 57 (3-4) ◽  
pp. 143-160
Author(s):  
Z. Hoque ◽  
B. Billah ◽  
S. Khan

In this paper we propose shrinkage preliminary test estimator (SPTE) of the coefficient vector in the multiple linear regression model based on the size corrected Wald ( W), likelihood ratio ( LR) and Lagrangian multiplier ( LM) tests. The correction factors used are those obt,ained from degrees of freedom corrections to the estimate of the error variance and those obtained from the second­order Edgeworth approximations to the exact distributions of the test statistics. The bias and weighted mean squared error (WMSE) fun ctions of the estimators are derived. With respect to WMSE, the relative efficiencies of the SPTEs relative to the maximum likelihood estimator are calculated. This study shows that the amount of conflict can be substantial when the three t ests are based on the same asymptotic chi­square critical value. The conflict among the SPTEs is due to the asymptotic tests not having the correct significance level. The Edgeworth size corrected W, LR and LM tests reduce the conflict remarkably.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 512
Author(s):  
Charlotte Soneson ◽  
Federico Marini ◽  
Florian Geier ◽  
Michael I. Love ◽  
Michael B. Stadler

Linear and generalized linear models are used extensively in many scientific fields, to model observed data and as the basis for hypothesis tests. The use of such models requires specification of a design matrix, and subsequent formulation of contrasts representing scientific hypotheses of interest. Proper execution of these steps requires a thorough understanding of the meaning of the individual coefficients, and is a frequent source of uncertainty for end users. Here, we present an R/Bioconductor package, ExploreModelMatrix, which enables interactive exploration of design matrices and linear model diagnostics. Given a sample data table and a desired design formula, the package displays how the model coefficients are combined to give the fitted values for each combination of predictor variables, which allows users to both extract the interpretation of each individual coefficient, and formulate desired linear contrasts. In addition, the interactive interface displays informative characteristics for the regular linear model corresponding to the provided design, such as variance inflation factors and the pseudoinverse of the design matrix. We envision the package and the built-in collection of common types of linear model designs to be useful for teaching and self-learning purposes, as well as for assisting more experienced users in the interpretation of complex model designs.


Author(s):  
Christoph Brandstetter ◽  
Sina Stapelfeldt

Non-synchronous vibrations arising near the stall boundary of compressors are a recurring and potentially safety-critical problem in modern aero-engines. Recent numerical and experimental investigations have shown that these vibrations are caused by the lock-in of circumferentially convected aerodynamic disturbances and structural vibration modes, and that it is possible to predict unstable vibration modes using coupled linear models. This paper aims to further investigate non-synchronous vibrations by casting a reduced model for NSV in the frequency domain and analysing stability for a range of parameters. It is shown how, and why, under certain conditions linear models are able to capture a phenomenon, which has traditionally been associated with aerodynamic non-linearities. The formulation clearly highlights the differences between convective non-synchronous vibrations and flutter and identifies the modifications necessary to make quantitative predictions.


2021 ◽  
Vol 12 (3) ◽  
pp. 102
Author(s):  
Jaouad Khalfi ◽  
Najib Boumaaz ◽  
Abdallah Soulmani ◽  
El Mehdi Laadissi

The Box–Jenkins model is a polynomial model that uses transfer functions to express relationships between input, output, and noise for a given system. In this article, we present a Box–Jenkins linear model for a lithium-ion battery cell for use in electric vehicles. The model parameter identifications are based on automotive drive-cycle measurements. The proposed model prediction performance is evaluated using the goodness-of-fit criteria and the mean squared error between the Box–Jenkins model and the measured battery cell output. A simulation confirmed that the proposed Box–Jenkins model could adequately capture the battery cell dynamics for different automotive drive cycles and reasonably predict the actual battery cell output. The goodness-of-fit value shows that the Box–Jenkins model matches the battery cell data by 86.85% in the identification phase, and 90.83% in the validation phase for the LA-92 driving cycle. This work demonstrates the potential of using a simple and linear model to predict the battery cell behavior based on a complex identification dataset that represents the actual use of the battery cell in an electric vehicle.


Sign in / Sign up

Export Citation Format

Share Document