scholarly journals Performance analysis on least absolute shrinkage selection operator, elastic net and correlation adjusted elastic net regression methods

Author(s):  
Pascalis Kadaro Matthew ◽  
Abubakar Yahaya

<p>Some few decades ago, penalized regression techniques for linear regression have been developed specifically to reduce the flaws inherent in the prediction accuracy of the classical ordinary least squares (OLS) regression technique. In this paper, we used a diabetes data set obtained from previous literature to compare three of these well-known techniques, namely: Least Absolute Shrinkage Selection Operator (LASSO), Elastic Net and Correlation Adjusted Elastic Net (CAEN). After thorough analysis, it was observed that CAEN generated a less complex model.</p>

2009 ◽  
Vol 2009 ◽  
pp. 1-8 ◽  
Author(s):  
Janet Myhre ◽  
Daniel R. Jeske ◽  
Michael Rennie ◽  
Yingtao Bi

A heteroscedastic linear regression model is developed from plausible assumptions that describe the time evolution of performance metrics for equipment. The inherited motivation for the related weighted least squares analysis of the model is an essential and attractive selling point to engineers with interest in equipment surveillance methodologies. A simple test for the significance of the heteroscedasticity suggested by a data set is derived and a simulation study is used to evaluate the power of the test and compare it with several other applicable tests that were designed under different contexts. Tolerance intervals within the context of the model are derived, thus generalizing well-known tolerance intervals for ordinary least squares regression. Use of the model and its associated analyses is illustrated with an aerospace application where hundreds of electronic components are continuously monitored by an automated system that flags components that are suspected of unusual degradation patterns.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
ZhenKai Cui ◽  
Cheng Wang ◽  
Jianwei Chen ◽  
Ting He

In order to solve the problems of large number of conditions at inherent frequencies and low prediction accuracy when using multiple multivariate linear regression methods for vibration response prediction alone, an elastic-net regularization method is proposed. Firstly, a multi-input and multioutput linear regression model of the multipoint frequency domain vibration response is trained using historical data at each frequency point. Secondly, the trained model under each frequency point is improved by the elastic regularization. Finally, the model is used in a working situation. The predicted vibration response on the experimental dataset of cylindrical shell acoustic vibration showed that the improvement of the multivariate regression vibration response prediction model by elastic regularization can better improve the accuracy and reduce the large number of conditions at some frequencies.


2018 ◽  
Vol 2 (2) ◽  
pp. 7-14
Author(s):  
Resty Fanny ◽  
Anik Djuraidah ◽  
Aam Alamudi

Regression analysis is a statistical technique to examine and model the relationship between dependent variable and independent variable. Multiple linear regression includes more than one independent variable. Multicollinearity in multiple linear regression occurs when the independent variables has correlations. Multicolinearity causes the estimator by ordinary least square to be unstable and produce a large variety. Multicollinearity can be overcome by the addition of penalized regression coefficient. The purpose of this research is modeling ridge regression, LASSO, and elastic-net. Data which is data of fisherman catch at Carocok Beach of Tarusan Sumatera Barat as dependent variable and amount of labor, amount of fuel, volume of fishing/waring boat, number of catches, ship size, number of boat wattage, sea experience, education and age of fisher as independent variables. The best model provided by LASSO that has a RMSEP value of validated regression model is minimum than ridge regression and elastic-net. LASSO shrinked amount of labor, amount of fuel and number of wattage equal zero. There can be influence (productivity change) that is volume of fishing/waring boat and boat size that used by fisher.


2021 ◽  
Author(s):  
Ahmad Roumiani ◽  
Abbas Mofidi

Abstract Paying attention to human activities in terms of land grazing infrastructure, crops, forest products and carbon impact, the so-called ecological impact (EF) is one of the most important economic issues in the world. In the present study, data from global databases were used. The ability of the penalized regression approach (PR including Ridge, Lasso and Elastic Net) and artificial neural network (ANN) to predict EF indices in the G-20 over the past two decades (1999–2018) was depicted and compared. For this purpose, 10-fold cross-validation was used to assess predictive performance and to specify a penalty parameter for PR models. Based on the results, a slight improvement in prediction performance was observed over linear regression. Using the Elastic Net model, more global macro indices were selected than Lasso. Although Lasso included only some indicators, it still had better predictive performance among PR models. Although the findings using PR methods were only slightly better than linear regression, their interest in selecting a subset of controllable indicators by shrinking the coefficients and creating a parsimonious model was apparent. As a result, penalized regression methods would be preferred, using feature selectivity and interpretive considerations rather than predictive performance alone. On the other hand, neural network-based models with higher values of coefficients of determination (R2) and values lower of RMSE than PR and OLS had significant performance and showed that they are more accurate in predicting EF. The results showed that the ANN network could provide considerable and appropriate predictions for EF indicators in the G-20 countries. predictions


2020 ◽  
Vol 4 (1) ◽  
Author(s):  
Andrew J. Gregory ◽  
Emma S. Spence

Spatial statistics and experimental design are among the most important topics students in the environmental and ecological sciences learn and utilize throughout their careers. These topics are also among the most difficult for students to learn, often due to the use of contrived data sets that present simplified and unrealistic scenarios that fail to engage students in higher level thinking. One way to engage students in higher level thinking is to use an inquiry-based pedagogical framework. The use of inquiry as a pedagogical approach should be instinctive for most scientists, as it mimics how science is conducted, yet most instructors continue to use lecture-based, textbook-driven instructional formats. This type of approach is efficient in covering material, but it suffers in its ability to engage students or enhance learning. Using a Bigfoot data set in an inquiry-based framework, students in a cross-listed graduate/undergraduate statistics class learned ordinary least squares regression and geographically weighted regression techniques. These techniques are among the most frequently applied analyses in the natural sciences. The use of a Bigfoot data set engaged students’ interest, rendering the prospect of learning regression topics as an emergent property of their interest and engagement. This approach also has an additional benefit in that students learned not only key statistical concepts but also learn how to self-diagnose deficiencies with their models as well as how to identify strategies to overcome these deficiencies. We hope that both instructors and students in graduate and undergraduate statistics or spatial modeling courses find this case study, and included data sets, a useful and interesting approach to teach and learn regression and spatial regression.


2011 ◽  
Vol 93 (6) ◽  
pp. 409-417 ◽  
Author(s):  
PASCAL CROISEAU ◽  
ANDRÉS LEGARRA ◽  
FRANÇOIS GUILLAUME ◽  
SÉBASTIEN FRITZ ◽  
AURÉLIA BAUR ◽  
...  

SummaryFor genomic selection methods, the statistical challenge is to estimate the effect of each of the available single-nucleotide polymorphism (SNP). In a context where the number of SNPs (p) is much higher than the number of bulls (n), this task may lead to a poor estimation of these SNP effects if, as for genomic BLUP (gBLUP), all SNPs have a non-null effect. An alternative is to use approaches that have been developed specifically to solve the ‘p>>n’ problem. This is the case of variable selection methods and among them, we focus on the Elastic-Net (EN) algorithm that is a penalized regression approach. Performances of EN, gBLUP and pedigree-based BLUP were compared with data from three French dairy cattle breeds, giving very encouraging results for EN. We tried to push further the idea of improving SNP effect estimates by considering fewer of them. This variable selection strategy was considered both in the case of gBLUP and EN by adding an SNP pre-selection step based on quantitative trait locus (QTL) detection. Similar results were observed with or without a pre-selection step, in terms of correlations between direct genomic value (DGV) and observed daughter yield deviation in a validation data set. However, when applied to the EN algorithm, this strategy led to a substantial reduction of the number of SNPs included in the prediction equation. In a context where the number of genotyped animals and the number of SNPs gets larger and larger, SNP pre-selection strongly alleviates computing requirements and ensures that national evaluations can be completed within a reasonable time frame.


2018 ◽  
Vol 11 (2) ◽  
pp. 1233-1250 ◽  
Author(s):  
Cheng Wu ◽  
Jian Zhen Yu

Abstract. Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.


2020 ◽  
Author(s):  
Yihuan Huang ◽  
Amanda Kay Montoya

Machine learning methods are being increasingly adopted in psychological research. Lasso performs variable selection and regularization, and is particularly appealing to psychology researchers because of its connection to linear regression. Researchers conflate properties of linear regression with properties of lasso; however, we demonstrate that this is not the case for models with categorical predictors. Specifically, the coding strategy used for categorical predictors impacts lasso’s performance but not linear regression. Group lasso is an alternative to lasso for models with categorical predictors. We demonstrate the inconsistency of lasso and group lasso models using a real data set: lasso performs different variable selection and has different prediction accuracy depending on the coding strategy, and group lasso performs consistent variable selection but has different prediction accuracy. Additionally, group lasso may include many predictors when very few are needed, leading to overfitting. Using Monte Carlo simulation, we show that categorical variables with one group mean differing from all others (one dominant group) are more likely to be included in the model by group lasso than lasso, leading to overfitting. This effect is strongest when the mean difference is large and there are many categories. Researchers primarily focus on the similarity between linear regression and lasso, but pay little attention to their different properties. This project demonstrates that when using lasso and group lasso, the effect of coding strategies should be considered. We conclude with recommended solutions to this issue and future directions of exploration to improve implementation of machine learning approaches in psychological science.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Sican Xiong ◽  
Meng Wang ◽  
Jun Zou ◽  
Jinling Meng ◽  
Yanyan Liu

Improving the prediction accuracy of a complex trait of interest is key to performing genomic selection (GS) for crop breeding. For the complex trait measured in multiple environments, this paper proposes a two-stage method to solve a linear model that jointly models the genetic effects and the genotype × environment interaction (G × E) effects. In the first stage, the least absolute shrinkage and selection operator (LASSO) penalized method was utilized to identify quantitative trait loci (QTL). Then, the ordinary least squares (OLS) approach was used in the second stage to reestimate the QTL effects. As a case study, this approach was used to improve the prediction accuracies of flowering time (FT), oil content (OC), and seed yield per plant (SY) in Brassica napus (B. napus). The results showed that the G × E effects reduced the mean squared error (MSE) significantly. Numerous QTL were environment-specific and presented minor effects. On average, the two-stage method, named OLS post-LASSO, offers the highest prediction accuracies (correlations are 0.8789, 0.9045, and 0.5507 for FT, OC, and SY, respectively). It was followed by the marker × environment interaction (M × E) genomic best linear unbiased prediction (GBLUP) model (correlations are 0.8347, 0.8205, and 0.4005 for FT, OC, and SY, respectively), the LASSO method (correlations are 0.7583, 0.7755, and 0.2718 for FT, OC, and SY, respectively), and the stratified GBLUP model (correlations are 0.6789, 0.6361, and 0.2860 for FT, OC, and SY, respectively). The two-stage method showed an obvious improvement in the prediction accuracy, and this study will provide methods and reference to improve GS of breeding.


Author(s):  
PETER J. SMITH

In survival analysis, proportional hazards models are commonly used in a distribution-free way for regression where the response variable has been right-censored. However, often a simple linear model may be more appropriate for the data. We present an overview of the linear regression method of Buckley and James (1979) as an interesting distribution-free alternative. It is a method providing consistent parameter estimates, and, through simulation, has been successfully appraised in the literature in comparison to other regression methods. The model is fitted iteratively, and censored points in the scatterplot are moved to estimated positions as if they had been observed without censoring. In this paper we will concentrate on scatterplot effects, model fitting and diagnostics.


Sign in / Sign up

Export Citation Format

Share Document