scholarly journals VIF-Regression Screening Ultrahigh Dimensional Feature Space

2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Hassan S Uraibi

Iterative Sure Independent Screening (ISIS) was proposed for the problem of variable selection with ultrahigh dimensional feature space. Unfortunately, the ISIS method transforms the dimensionality of features from ultrahigh to ultra-low and may result in un-reliable inference when the number of important variables particularly is greater than the screening threshold. The proposed method has transformed the ultrahigh dimensionality of features to high dimension space in order to remedy of losing some information by ISIS method. The proposed method is compared with ISIS method by using real data and simulation. The results show this method is more efficient and more reliable than ISIS method.

2021 ◽  
Author(s):  
Émeline Courtois ◽  
Pascale Tubert-Bitter ◽  
Ismaïl Ahmed

Abstract Background: Adverse effects of drugs are often identified after market introduction. Post-marketing pharmacovigilance aims to detect them as early as possible and relies on spontaneous reporting systems collecting suspicious cases. Signal detection tools have been developed to mine these large databases and counts of reports are analysed with disproportionality methods. To address disproportionality method biases, recent methods apply to individual observations taking into account all exposures for the same patient. In particular, the logistic lasso provides an efficient variable selection framework, yet the choice of the regularization parameter is a challenging issue and the lasso variable selection may be inconsistent. Methods: We propose a new signal detection methodology based on the adaptive lasso in the context of high dimension. We derived two new adaptive weights from (i) a lasso regression using the Bayesian Information Criterion (BIC), and (ii) the class-imbalanced subsampling lasso (CISL), an extension of stability selection. The BIC is used in the adaptive lasso stage for variable selection. We performed an extensive simulation study and an application to real data, where we compared our methods to the existing adaptive lasso, and recent detection approaches based on lasso regression or propensity scores in high dimension. For both studies, we evaluate the methods in terms of false discoveries and sensitivity. Results: In the simulations and the application, both proposed adaptive weights show equivalent or better performances than the other competitors, with an advantage for the CISL-based adaptive weights. CISL and lasso regression using BIC are solid alternatives. Conclusion: Our proposed adaptive lasso is an appealing methodology for signal detection in pharmacovigilance. Although we cannot rely on test theory, our approaches show a low and stable False Discovery Rate in all simulation settings. All methods evaluated in this work are implemented in the adapt4pv R package.


2021 ◽  
Author(s):  
Mikhail Kanevski

<p>Nowadays a wide range of methods and tools to study and forecast time series is available. An important problem in forecasting concerns embedding of time series, i.e. construction of a high dimensional space where forecasting problem is considered as a regression task. There are several basic linear and nonlinear approaches of constructing such space by defining an optimal delay vector using different theoretical concepts. Another way is to consider this space as an input feature space – IFS, and to apply machine learning feature selection (FS) algorithms to optimize IFS according to the problem under study (analysis, modelling or forecasting). Such approach is an empirical one: it is based on data and depends on the FS algorithms applied. In machine learning features are generally classified as relevant, redundant and irrelevant. It gives a reach possibility to perform advanced multivariate time series exploration and development of interpretable predictive models.</p><p>Therefore, in the present research different FS algorithms are used to analyze fundamental properties of time series from empirical point of view. Linear and nonlinear simulated time series are studied in detail to understand the advantages and drawbacks of the proposed approach. Real data case studies deal with air pollution and wind speed times series. Preliminary results are quite promising and more research is in progress.</p>


Author(s):  
Xuan Cao ◽  
Lili Ding ◽  
Tesfaye B. Mersha

AbstractIn this study, we conduct a comparison of three most recent statistical methods for joint variable selection and covariance estimation with application of detecting expression quantitative trait loci (eQTL) and gene network estimation, and introduce a new hierarchical Bayesian method to be included in the comparison. Unlike the traditional univariate regression approach in eQTL, all four methods correlate phenotypes and genotypes by multivariate regression models that incorporate the dependence information among phenotypes, and use Bayesian multiplicity adjustment to avoid multiple testing burdens raised by traditional multiple testing correction methods. We presented the performance of three methods (MSSL – Multivariate Spike and Slab Lasso, SSUR – Sparse Seemingly Unrelated Bayesian Regression, and OBFBF – Objective Bayes Fractional Bayes Factor), along with the proposed, JDAG (Joint estimation via a Gaussian Directed Acyclic Graph model) method through simulation experiments, and publicly available HapMap real data, taking asthma as an example. Compared with existing methods, JDAG identified networks with higher sensitivity and specificity under row-wise sparse settings. JDAG requires less execution in small-to-moderate dimensions, but is not currently applicable to high dimensional data. The eQTL analysis in asthma data showed a number of known gene regulations such as STARD3, IKZF3 and PGAP3, all reported in asthma studies. The code of the proposed method is freely available at GitHub (https://github.com/xuan-cao/Joint-estimation-for-eQTL).


2019 ◽  
Vol 31 (8) ◽  
pp. 1718-1750
Author(s):  
Kota Matsui ◽  
Wataru Kumagai ◽  
Kenta Kanamori ◽  
Mitsuaki Nishikimi ◽  
Takafumi Kanamori

In this letter, we propose a variable selection method for general nonparametric kernel-based estimation. The proposed method consists of two-stage estimation: (1) construct a consistent estimator of the target function, and (2) approximate the estimator using a few variables by [Formula: see text]-type penalized estimation. We see that the proposed method can be applied to various kernel nonparametric estimation such as kernel ridge regression, kernel-based density, and density-ratio estimation. We prove that the proposed method has the property of variable selection consistency when the power series kernel is used. Here, the power series kernel is a certain class of kernels containing polynomial and exponential kernels. This result is regarded as an extension of the variable selection consistency for the nonnegative garrote (NNG), a special case of the adaptive Lasso, to the kernel-based estimators. Several experiments, including simulation studies and real data applications, show the effectiveness of the proposed method.


2014 ◽  
Vol 989-994 ◽  
pp. 3675-3678
Author(s):  
Xiao Fen Wang ◽  
Hai Na Zhang ◽  
Xiu Rong Qiu ◽  
Jiang Ping Song ◽  
Ke Xin Zhang

Self-adapt distance measure supervised locally linear embedding solves the problem that Euclidean distance measure can not apart from samples in content-based image retrieval. This method uses discriminative distance measure to construct k-NN and effectively keeps its topological structure in high dimension space, meanwhile it broadens interval of samples and strengthens the ability of classifying. Experiment results show the ADM-SLLE date-reducing-dimension method speeds up the image retrieval and acquires high accurate rate in retrieval.


2017 ◽  
Vol 2017 ◽  
pp. 1-8
Author(s):  
Ali Alkenani ◽  
Tahir R. Dikheel

The elimination of insignificant predictors and the combination of predictors with indistinguishable coefficients are the two issues raised in searching for the true model. Pairwise Absolute Clustering and Sparsity (PACS) achieves both goals. Unfortunately, PACS is sensitive to outliers due to its dependency on the least-squares loss function which is known to be very sensitive to unusual data. In this article, the sensitivity of PACS to outliers has been studied. Robust versions of PACS (RPACS) have been proposed by replacing the least squares and nonrobust weights in PACS with MM-estimation and robust weights depending on robust correlations instead of person correlation, respectively. A simulation study and two real data applications have been used to assess the effectiveness of the proposed methods.


Author(s):  
Assi N'GUESSAN ◽  
Ibrahim Sidi Zakari ◽  
Assi Mkhadri

International audience We consider the problem of variable selection via penalized likelihood using nonconvex penalty functions. To maximize the non-differentiable and nonconcave objective function, an algorithm based on local linear approximation and which adopts a naturally sparse representation was recently proposed. However, although it has promising theoretical properties, it inherits some drawbacks of Lasso in high dimensional setting. To overcome these drawbacks, we propose an algorithm (MLLQA) for maximizing the penalized likelihood for a large class of nonconvex penalty functions. The convergence property of MLLQA and oracle property of one-step MLLQA estimator are established. Some simulations and application to a real data set are also presented.


Sign in / Sign up

Export Citation Format

Share Document