Robust Group Identification and Variable Selection in Regression

Journal of Probability and Statistics ◽

10.1155/2017/2170816 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8

Author(s):

Ali Alkenani ◽

Tahir R. Dikheel

Keyword(s):

Variable Selection ◽

Least Squares ◽

Loss Function ◽

Simulation Study ◽

Group Identification ◽

Real Data ◽

True Model ◽

Mm Estimation

The elimination of insignificant predictors and the combination of predictors with indistinguishable coefficients are the two issues raised in searching for the true model. Pairwise Absolute Clustering and Sparsity (PACS) achieves both goals. Unfortunately, PACS is sensitive to outliers due to its dependency on the least-squares loss function which is known to be very sensitive to unusual data. In this article, the sensitivity of PACS to outliers has been studied. Robust versions of PACS (RPACS) have been proposed by replacing the least squares and nonrobust weights in PACS with MM-estimation and robust weights depending on robust correlations instead of person correlation, respectively. A simulation study and two real data applications have been used to assess the effectiveness of the proposed methods.

Download Full-text

Group Identification and Variable Selection in Quantile Regression

Journal of Probability and Statistics ◽

10.1155/2019/8504174 ◽

2019 ◽

Vol 2019 ◽

pp. 1-7

Author(s):

Ali Alkenani ◽

Basim Shlaibah Msallam

Keyword(s):

Variable Selection ◽

Quantile Regression ◽

Group Identification ◽

Real Data ◽

True Model

Using the Pairwise Absolute Clustering and Sparsity (PACS) penalty, we proposed the regularized quantile regression QR method (QR-PACS). The PACS penalty achieves the elimination of insignificant predictors and the combination of predictors with indistinguishable coefficients (IC), which are the two issues raised in the searching for the true model. QR-PACS extends PACS from mean regression settings to QR settings. The paper shows that QR-PACS can yield promising predictive precision as well as identifying related groups in both simulation and real data.

Download Full-text

Robust Lag Weighted Lasso for Time Series Model

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1608553500 ◽

2021 ◽

Vol 19 (1) ◽

pp. 2-15

Author(s):

Tahir R. Dikheel ◽

Alaa Q. Yaseen

Keyword(s):

Time Series ◽

Simulation Study ◽

Real Data ◽

Time Series Model ◽

The Other ◽

The Real ◽

Lag Effects ◽

True Model

The lag-weighted lasso was introduced to deal with lag effects when identifying the true model in time series. This method depends on weights to reflect both the coefficient size and the lag effects. However, the lag weighted lasso is not robust. To overcome this problem, we propose robust lag weighted lasso methods. Both the simulation study and the real data example show that the proposed methods outperform the other existing methods.

Download Full-text

Efficient Estimation of the Generalized Quasi-Lindley Distribution Parameters under Ranked Set Sampling and Applications

Mathematical Problems in Engineering ◽

10.1155/2021/9982397 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Amer Ibrahim Al-Omari ◽

SidAhmed Benchiha ◽

Ibrahim M. Almanjahie

Keyword(s):

Least Squares ◽

Random Sample ◽

Simulation Study ◽

Real Data ◽

Ranked Set Sampling ◽

Efficient Estimation ◽

Estimation Methods ◽

Simple Random Sample ◽

Lindley Distribution ◽

Study Results

Ranked set sampling is a very useful method to collect data when the actual measurement of the units in a population is difficult or expensive. Recently, the generalized quasi-Lindley distribution is suggested as a new continuous lifetime distribution. In this article, the ranked set sampling method is considered to estimate the parameters of the generalized quasi-Lindley distribution. Several estimation methods are used, including the maximum likelihood, the maximum product of spacings, ordinary least squares, weighted least squares, Cramer–von Mises, and Anderson–Darling methods. The performance of the proposed ranked set sampling based estimators is achieved through a simulation study in terms of bias and mean squared errors compared to the simple random sample. Additional results are obtained based on real data for the survival times of 72 guinea pigs and 23 ball bearings. The simulation study results and the real data applications showed the superiority of the proposed ranked set sampling estimators compared to the simple random sample competitors based on the same number of measuring units.

Download Full-text

On the Extension of Exponentiated Pareto Distribution

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1619481840 ◽

2021 ◽

Vol 19 (1) ◽

pp. 2-26

Author(s):

Amal S. Hassan ◽

Saeed Elsayed Hemeda ◽

Said G. Nassr

Keyword(s):

Maximum Likelihood ◽

Least Squares ◽

Simulation Study ◽

Pareto Distribution ◽

Weighted Least Squares ◽

Real Data ◽

Statistical Properties ◽

Bayesian Estimators

In this study, an extended exponentiated Pareto distribution is proposed. Some statistical properties are derived. We consider maximum likelihood, least squares, weighted least squares and Bayesian estimators. A simulation study is implemented for investigating the accuracy of different estimators. An application of the proposed distribution to a real data is presented.

Download Full-text

Multiple and complex network built by the path coefficients for partial least squares variable selection research

Future Communication Technology ◽

10.2495/icct130901 ◽

2014 ◽

Author(s):

Wangping Xiong ◽

Ying Xiong ◽

Jianqiang Du ◽

Bin Nie

Keyword(s):

Variable Selection ◽

Least Squares ◽

Partial Least Squares ◽

Complex Network ◽

Path Coefficients ◽

Selection Research

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

Variable Selection with Partial Least Squares Sensitivity Analysis: An Application to Currency Crises' Real Effects

SSRN Electronic Journal ◽

10.2139/ssrn.909508 ◽

2006 ◽

Author(s):

Fabio A. Arciniegas ◽

Mark Embrechts ◽

Ismael E. Arciniegas Rueda

Keyword(s):

Sensitivity Analysis ◽

Variable Selection ◽

Least Squares ◽

Partial Least Squares ◽

Currency Crises ◽

Real Effects

Download Full-text

Transforming variables to central normality

Machine Learning ◽

10.1007/s10994-021-05960-5 ◽

2021 ◽

Author(s):

Jakob Raymaekers ◽

Peter J. Rousseeuw

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimator ◽

Simulation Study ◽

Real Data ◽

Data Sets ◽

Transformation Parameter ◽

Likelihood Estimator ◽

Extensive Simulation ◽

Highly Sensitive

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.

Download Full-text

Using Least-Squares Residuals to Assess the Stochasticity of Measurements—Example: Terrestrial Laser Scanner and Surface Modeling

Engineering Proceedings ◽

10.3390/engproc2021005059 ◽

2021 ◽

Vol 5 (1) ◽

pp. 59

Author(s):

Gaël Kermarrec ◽

Niklas Schild ◽

Jan Hartmann

Keyword(s):

Least Squares ◽

Laser Scanner ◽

Real Data ◽

Point Clouds ◽

Statistical Testing ◽

Normal Vector ◽

Engineering Structures ◽

3D Point Clouds ◽

Least Squares Adjustment ◽

Correlation Parameters

Terrestrial laser scanners (TLS) capture a large number of 3D points rapidly, with high precision and spatial resolution. These scanners are used for applications as diverse as modeling architectural or engineering structures, but also high-resolution mapping of terrain. The noise of the observations cannot be assumed to be strictly corresponding to white noise: besides being heteroscedastic, correlations between observations are likely to appear due to the high scanning rate. Unfortunately, if the variance can sometimes be modeled based on physical or empirical considerations, the latter are more often neglected. Trustworthy knowledge is, however, mandatory to avoid the overestimation of the precision of the point cloud and, potentially, the non-detection of deformation between scans recorded at different epochs using statistical testing strategies. The TLS point clouds can be approximated with parametric surfaces, such as planes, using the Gauss–Helmert model, or the newly introduced T-splines surfaces. In both cases, the goal is to minimize the squared distance between the observations and the approximated surfaces in order to estimate parameters, such as normal vector or control points. In this contribution, we will show how the residuals of the surface approximation can be used to derive the correlation structure of the noise of the observations. We will estimate the correlation parameters using the Whittle maximum likelihood and use comparable simulations and real data to validate our methodology. Using the least-squares adjustment as a “filter of the geometry” paves the way for the determination of a correlation model for many sensors recording 3D point clouds.

Download Full-text

Penalized partial least squares for pleiotropy

BMC Bioinformatics ◽

10.1186/s12859-021-03968-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Camilo Broc ◽

Therese Truong ◽

Benoit Liquet

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Association Studies ◽

A Priori ◽

Simulated Data ◽

Real Data ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Multiple Traits ◽

Application Fields

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.

Download Full-text