scholarly journals A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations

Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3057
Author(s):  
Mohammad Arashi ◽  
Mina Norouzirad ◽  
Mahdi Roozbeh ◽  
Naushad Mamode Mamode Khan

The ridge regression estimator is a commonly used procedure to deal with multicollinear data. This paper proposes an estimation procedure for high-dimensional multicollinear data that can be alternatively used. This usage gives a continuous estimate, including the ridge estimator as a particular case. We study its asymptotic performance for the growing dimension, i.e., p→∞ when n is fixed. Under some mild regularity conditions, we prove the proposed estimator’s consistency and derive its asymptotic properties. Some Monte Carlo simulation experiments are executed in their performance, and the implementation is considered to analyze a high-dimensional genetic dataset.

PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0245376
Author(s):  
M. Arashi ◽  
M. Roozbeh ◽  
N. A. Hamzah ◽  
M. Gasparini

With the advancement of technology, analysis of large-scale data of gene expression is feasible and has become very popular in the era of machine learning. This paper develops an improved ridge approach for the genome regression modeling. When multicollinearity exists in the data set with outliers, we consider a robust ridge estimator, namely the rank ridge regression estimator, for parameter estimation and prediction. On the other hand, the efficiency of the rank ridge regression estimator is highly dependent on the ridge parameter. In general, it is difficult to provide a satisfactory answer about the selection for the ridge parameter. Because of the good properties of generalized cross validation (GCV) and its simplicity, we use it to choose the optimum value of the ridge parameter. The GCV function creates a balance between the precision of the estimators and the bias caused by the ridge estimation. It behaves like an improved estimator of risk and can be used when the number of explanatory variables is larger than the sample size in high-dimensional problems. Finally, some numerical illustrations are given to support our findings.


1996 ◽  
Vol 26 (9) ◽  
pp. 1709-1713 ◽  
Author(s):  
Paul C. Van Deusen

Growth modeling of forests at the individual tree and stand levels is a highly refined procedure for many forest types. A method to incorporate predictions from such models into a forest inventory system is developed. Variance components from the actual measurements and from the predicted measurements are used to estimate the variance of the combined predicted value. The only assumption required to justify this method is that the model estimate has a bias that does not change from one time period to the next. The estimation procedure proposed here can also incorporate remotely sensed information via a regression estimator.


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
John Maclean ◽  
Elaine T. Spiller

<p style='text-indent:20px;'>Many recent advances in sequential assimilation of data into nonlinear high-dimensional models are modifications to particle filters which employ efficient searches of a high-dimensional state space. In this work, we present a complementary strategy that combines statistical emulators and particle filters. The emulators are used to learn and offer a computationally cheap approximation to the forward dynamic mapping. This emulator-particle filter (Emu-PF) approach requires a modest number of forward-model runs, but yields well-resolved posterior distributions even in non-Gaussian cases. We explore several modifications to the Emu-PF that utilize mechanisms for dimension reduction to efficiently fit the statistical emulator, and present a series of simulation experiments on an atypical Lorenz-96 system to demonstrate their performance. We conclude with a discussion on how the Emu-PF can be paired with modern particle filtering algorithms.</p>


1989 ◽  
Vol 26 (2) ◽  
pp. 214-221 ◽  
Author(s):  
Subhash Sharma ◽  
Srinivas Durvasula ◽  
William R. Dillon

The authors report some results on the behavior of alternative covariance structure estimation procedures in the presence of non-normal data. They conducted Monté Carlo simulation experiments with a factorial design involving three levels of skewness, three level of kurtosis, and three different sample sizes. For normal data, among all the elliptical estimation techniques, elliptical reweighted least squares (ERLS) was equivalent in performance to ML. However, as expected, for non-normal data parameter estimates were unbiased for ML and the elliptical estimation techniques, whereas the bias in standard errors was substantial for GLS and ML. Among elliptical estimation techniques, ERLS was superior in performance. On the basis of the simulation results, the authors recommend that researchers use ERLS for both normal and non-normal data.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Jiali Sun ◽  
Qingtai Wu ◽  
Dafeng Shen ◽  
Yangjun Wen ◽  
Fengrong Liu ◽  
...  

AbstractOne of the most important tasks in genome-wide association analysis (GWAS) is the detection of single-nucleotide polymorphisms (SNPs) which are related to target traits. With the development of sequencing technology, traditional statistical methods are difficult to analyze the corresponding high-dimensional massive data or SNPs. Recently, machine learning methods have become more popular in high-dimensional genetic data analysis for their fast computation speed. However, most of machine learning methods have several drawbacks, such as poor generalization ability, over-fitting, unsatisfactory classification and low detection accuracy. This study proposed a two-stage algorithm based on least angle regression and random forest (TSLRF), which firstly considered the control of population structure and polygenic effects, then selected the SNPs that were potentially related to target traits by using least angle regression (LARS), furtherly analyzed this variable subset using random forest (RF) to detect quantitative trait nucleotides (QTNs) associated with target traits. The new method has more powerful detection in simulation experiments and real data analyses. The results of simulation experiments showed that, compared with the existing approaches, the new method effectively improved the detection ability of QTNs and model fitting degree, and required less calculation time. In addition, the new method significantly distinguished QTNs and other SNPs. Subsequently, the new method was applied to analyze five flowering-related traits in Arabidopsis. The results showed that, the distinction between QTNs and unrelated SNPs was more significant than the other methods. The new method detected 60 genes confirmed to be related to the target trait, which was significantly higher than the other methods, and simultaneously detected multiple gene clusters associated with the target trait.


2007 ◽  
Vol 44 (04) ◽  
pp. 977-989 ◽  
Author(s):  
Peter J. Brockwell ◽  
Richard A. Davis ◽  
Yu Yang

Continuous-time autoregressive moving average (CARMA) processes with a nonnegative kernel and driven by a nondecreasing Lévy process constitute a very general class of stationary, nonnegative continuous-time processes. In financial econometrics a stationary Ornstein-Uhlenbeck (or CAR(1)) process, driven by a nondecreasing Lévy process, was introduced by Barndorff-Nielsen and Shephard (2001) as a model for stochastic volatility to allow for a wide variety of possible marginal distributions and the possibility of jumps. For such processes, we take advantage of the nonnegativity of the increments of the driving Lévy process to study the properties of a highly efficient estimation procedure for the parameters when observations are available of the CAR(1) process at uniformly spaced times 0,h,…,Nh. We also show how to reconstruct the background driving Lévy process from a continuously observed realization of the process and use this result to estimate the increments of the Lévy process itself when h is small. Asymptotic properties of the coefficient estimator are derived and the results illustrated using a simulated gamma-driven Ornstein-Uhlenbeck process.


2018 ◽  
Vol 7 (4) ◽  
pp. 104
Author(s):  
Conlet Biketi Kikechi ◽  
Richard Onyino Simwa

This article discusses the local polynomial regression estimator for  and the local polynomial regression estimator for  in a finite population. The performance criterion exploited in this study focuses on the efficiency of the finite population total estimators. Further, the discussion explores analytical comparisons between the two estimators with respect to asymptotic relative efficiency. In particular, asymptotic properties of the local polynomial regression estimator of finite population total for  are derived in a model based framework. The results of the local polynomial regression estimator for  are compared with those of the local polynomial regression estimator for  studied by Kikechi et al (2018). Variance comparisons are made using the local polynomial regression estimator  for  and the local polynomial regression estimator  for  which indicate that the estimators are asymptotically equivalently efficient. Simulation experiments carried out show that the local polynomial regression estimator  outperforms the local polynomial regression estimator  in the linear, quadratic and bump populations.


1994 ◽  
Vol 05 (03) ◽  
pp. 513-518 ◽  
Author(s):  
DIETRICH STAUFFER

The high-dimensional shape space for the antibodies of the immune system is simulated with an Ising-like interaction. However, instead of the molecular field being linear in the sum of the neighbor spins, we take it as quadratic and negative. In this way the bell-shaped response curve of biological immune systems is approximated, as a probabilistic generalization of window automata. We find phase transitions only in five and more dimensions, not in two to four, for nearest-neighbor interactions.


Sign in / Sign up

Export Citation Format

Share Document