Outlier detection and variable selection via difference based regression model and penalized regression

2018 ◽  
Vol 29 (3) ◽  
pp. 815-825 ◽  
Author(s):  
InHae Choi ◽  
Chun Gun Park ◽  
Kyeong Eun Lee
2020 ◽  
Vol 24 (5) ◽  
pp. 993-1010
Author(s):  
Hejie Lei ◽  
Xingke Chen ◽  
Ling Jian

Least absolute shrinkage and selection operator (LASSO) is one of the most commonly used methods for shrinkage estimation and variable selection. Robust variable selection methods via penalized regression, such as least absolute deviation LASSO (LAD-LASSO), etc., have gained growing attention in works of literature. However those penalized regression procedures are still sensitive to noisy data. Furthermore, “concept drift” makes learning from streaming data fundamentally different from the traditional batch learning. Focusing on the shrinkage estimation and variable selection tasks on noisy streaming data, this paper presents a noise-resilient online learning regression model, i.e. canal-LASSO. Comparing with the LASSO and LAD-LASSO, canal-LASSO is resistant to noisy data in both explanatory variables and response variables. Extensive simulation studies demonstrate satisfactory sparseness and noise-resilient performances of canal-LASSO.


2021 ◽  
pp. 114696
Author(s):  
M. Kashani ◽  
M. Arashi ◽  
M.R. Rabiei ◽  
P.D’Urso ◽  
L. De Giovanni

2020 ◽  
Vol 17 (2) ◽  
pp. 0550
Author(s):  
Ali Hameed Yousef ◽  
Omar Abdulmohsin Ali

         The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator has superior performance compared with other estimators.  


Author(s):  
Alain J Mbebi ◽  
Hao Tong ◽  
Zoran Nikoloski

AbstractMotivationGenomic selection (GS) is currently deemed the most effective approach to speed up breeding of agricultural varieties. It has been recognized that consideration of multiple traits in GS can improve accuracy of prediction for traits of low heritability. However, since GS forgoes statistical testing with the idea of improving predictions, it does not facilitate mechanistic understanding of the contribution of particular single nucleotide polymorphisms (SNP).ResultsHere, we propose a L2,1-norm regularized multivariate regression model and devise a fast and efficient iterative optimization algorithm, called L2,1-joint, applicable in multi-trait GS. The usage of the L2,1-norm facilitates variable selection in a penalized multivariate regression that considers the relation between individuals, when the number of SNPs is much larger than the number of individuals. The capacity for variable selection allows us to define master regulators that can be used in a multi-trait GS setting to dissect the genetic architecture of the analyzed traits. Our comparative analyses demonstrate that the proposed model is a favorable candidate compared to existing state-of-the-art approaches. Prediction and variable selection with datasets from Brassica napus, wheat and Arabidopsis thaliana diversity panels are conducted to further showcase the performance of the proposed model.Availability and implementation: The model is implemented using R programming language and the code is freely available from https://github.com/alainmbebi/L21-norm-GS.Supplementary informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Pingping Gao ◽  
Yabin Gao

This paper presents a fuzzy regression analysis method based on a general quadrilateral interval type-2 fuzzy numbers, regarding the data outlier detection. The Euclidean distance for the general quadrilateral interval type-2 fuzzy numbers is provided. In the sense of Euclidean distance, some parameter estimation laws of the type-2 fuzzy linear regression model are designed. Then, the data outlier detection-oriented parameter estimation method is proposed using the data deletion-based type-2 fuzzy regression model. Moreover, based on the fuzzy regression model, by using the root mean squared error method, an impact evaluation rule is designed for detecting data outlier. An example is finally provided to validate the presented methods.


2020 ◽  
Vol 12 (3) ◽  
pp. 376-398 ◽  
Author(s):  
Takumi Saegusa ◽  
Tianzhou Ma ◽  
Gang Li ◽  
Ying Qing Chen ◽  
Mei-Ling Ting Lee

Sign in / Sign up

Export Citation Format

Share Document