scholarly journals Robust Group Identification and Variable Selection in Regression

2017 ◽  
Vol 2017 ◽  
pp. 1-8
Author(s):  
Ali Alkenani ◽  
Tahir R. Dikheel

The elimination of insignificant predictors and the combination of predictors with indistinguishable coefficients are the two issues raised in searching for the true model. Pairwise Absolute Clustering and Sparsity (PACS) achieves both goals. Unfortunately, PACS is sensitive to outliers due to its dependency on the least-squares loss function which is known to be very sensitive to unusual data. In this article, the sensitivity of PACS to outliers has been studied. Robust versions of PACS (RPACS) have been proposed by replacing the least squares and nonrobust weights in PACS with MM-estimation and robust weights depending on robust correlations instead of person correlation, respectively. A simulation study and two real data applications have been used to assess the effectiveness of the proposed methods.

2019 ◽  
Vol 2019 ◽  
pp. 1-7
Author(s):  
Ali Alkenani ◽  
Basim Shlaibah Msallam

Using the Pairwise Absolute Clustering and Sparsity (PACS) penalty, we proposed the regularized quantile regression QR method (QR-PACS). The PACS penalty achieves the elimination of insignificant predictors and the combination of predictors with indistinguishable coefficients (IC), which are the two issues raised in the searching for the true model. QR-PACS extends PACS from mean regression settings to QR settings. The paper shows that QR-PACS can yield promising predictive precision as well as identifying related groups in both simulation and real data.


2021 ◽  
Vol 19 (1) ◽  
pp. 2-15
Author(s):  
Tahir R. Dikheel ◽  
Alaa Q. Yaseen

The lag-weighted lasso was introduced to deal with lag effects when identifying the true model in time series. This method depends on weights to reflect both the coefficient size and the lag effects. However, the lag weighted lasso is not robust. To overcome this problem, we propose robust lag weighted lasso methods. Both the simulation study and the real data example show that the proposed methods outperform the other existing methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Amer Ibrahim Al-Omari ◽  
SidAhmed Benchiha ◽  
Ibrahim M. Almanjahie

Ranked set sampling is a very useful method to collect data when the actual measurement of the units in a population is difficult or expensive. Recently, the generalized quasi-Lindley distribution is suggested as a new continuous lifetime distribution. In this article, the ranked set sampling method is considered to estimate the parameters of the generalized quasi-Lindley distribution. Several estimation methods are used, including the maximum likelihood, the maximum product of spacings, ordinary least squares, weighted least squares, Cramer–von Mises, and Anderson–Darling methods. The performance of the proposed ranked set sampling based estimators is achieved through a simulation study in terms of bias and mean squared errors compared to the simple random sample. Additional results are obtained based on real data for the survival times of 72 guinea pigs and 23 ball bearings. The simulation study results and the real data applications showed the superiority of the proposed ranked set sampling estimators compared to the simple random sample competitors based on the same number of measuring units.


2021 ◽  
Vol 19 (1) ◽  
pp. 2-26
Author(s):  
Amal S. Hassan ◽  
Saeed Elsayed Hemeda ◽  
Said G. Nassr

In this study, an extended exponentiated Pareto distribution is proposed. Some statistical properties are derived. We consider maximum likelihood, least squares, weighted least squares and Bayesian estimators. A simulation study is implemented for investigating the accuracy of different estimators. An application of the proposed distribution to a real data is presented.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


2021 ◽  
Author(s):  
Jakob Raymaekers ◽  
Peter J. Rousseeuw

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.


2021 ◽  
Vol 5 (1) ◽  
pp. 59
Author(s):  
Gaël Kermarrec ◽  
Niklas Schild ◽  
Jan Hartmann

Terrestrial laser scanners (TLS) capture a large number of 3D points rapidly, with high precision and spatial resolution. These scanners are used for applications as diverse as modeling architectural or engineering structures, but also high-resolution mapping of terrain. The noise of the observations cannot be assumed to be strictly corresponding to white noise: besides being heteroscedastic, correlations between observations are likely to appear due to the high scanning rate. Unfortunately, if the variance can sometimes be modeled based on physical or empirical considerations, the latter are more often neglected. Trustworthy knowledge is, however, mandatory to avoid the overestimation of the precision of the point cloud and, potentially, the non-detection of deformation between scans recorded at different epochs using statistical testing strategies. The TLS point clouds can be approximated with parametric surfaces, such as planes, using the Gauss–Helmert model, or the newly introduced T-splines surfaces. In both cases, the goal is to minimize the squared distance between the observations and the approximated surfaces in order to estimate parameters, such as normal vector or control points. In this contribution, we will show how the residuals of the surface approximation can be used to derive the correlation structure of the noise of the observations. We will estimate the correlation parameters using the Whittle maximum likelihood and use comparable simulations and real data to validate our methodology. Using the least-squares adjustment as a “filter of the geometry” paves the way for the determination of a correlation model for many sensors recording 3D point clouds.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilo Broc ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.


Sign in / Sign up

Export Citation Format

Share Document