scholarly journals Proposing Robust LAD-Atan Penalty of Regression Model Estimation for High Dimensional Data

2020 ◽  
Vol 17 (2) ◽  
pp. 0550
Author(s):  
Ali Hameed Yousef ◽  
Omar Abdulmohsin Ali

         The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator has superior performance compared with other estimators.  

2020 ◽  
Vol 24 (5) ◽  
pp. 993-1010
Author(s):  
Hejie Lei ◽  
Xingke Chen ◽  
Ling Jian

Least absolute shrinkage and selection operator (LASSO) is one of the most commonly used methods for shrinkage estimation and variable selection. Robust variable selection methods via penalized regression, such as least absolute deviation LASSO (LAD-LASSO), etc., have gained growing attention in works of literature. However those penalized regression procedures are still sensitive to noisy data. Furthermore, “concept drift” makes learning from streaming data fundamentally different from the traditional batch learning. Focusing on the shrinkage estimation and variable selection tasks on noisy streaming data, this paper presents a noise-resilient online learning regression model, i.e. canal-LASSO. Comparing with the LASSO and LAD-LASSO, canal-LASSO is resistant to noisy data in both explanatory variables and response variables. Extensive simulation studies demonstrate satisfactory sparseness and noise-resilient performances of canal-LASSO.


2013 ◽  
Vol 444-445 ◽  
pp. 604-609
Author(s):  
Guang Hui Fu ◽  
Pan Wang

LASSO is a very useful variable selection method for high-dimensional data , But it does not possess oracle property [Fan and Li, 200 and group effect [Zou and Hastie, 200. In this paper, we firstly review four improved LASSO-type methods which satisfy oracle property and (or) group effect, and then give another two new ones called WFEN and WFAEN. The performance on both the simulation and real data sets shows that WFEN and WFAEN are competitive with other LASSO-type methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Jan Kalina ◽  
Anna Schlenker

The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.


2018 ◽  
Vol 8 (2) ◽  
pp. 377-406
Author(s):  
Almog Lahav ◽  
Ronen Talmon ◽  
Yuval Kluger

Abstract A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan–Meier survival plot.


Sign in / Sign up

Export Citation Format

Share Document