L1 Penalized Regression Procedures for Feature Selection

AbstractMachine learning feature selection methods are needed to detect complex interaction-network effects in complicated modeling scenarios in high-dimensional data, such as GWAS, gene expression, eQTL, and structural/functional neuroimage studies for case-control or continuous outcomes. In addition, many machine learning methods have limited ability to address the issues of controlling false discoveries and adjusting for covariates. To address these challenges, we develop a new feature selection technique called Nearest-neighbor Projected-Distance Regression (NPDR) that calculates the importance of each predictor using generalized linear model (GLM) regression of distances between nearest-neighbor pairs projected onto the predictor dimension. NPDR captures the underlying interaction structure of data using nearest-neighbors in high dimensions, handles both dichotomous and continuous outcomes and predictor data types, statistically corrects for covariates, and permits statistical inference and penalized regression. We use realistic simulations with interactions and other effects to show that NPDR has better precision-recall than standard Relief-based feature selection and random forest importance, with the additional benefit of covariate adjustment and multiple testing correction. Using RNA-Seq data from a study of major depressive disorder (MDD), we show that NPDR with covariate adjustment removes spurious associations due to confounding. We apply NPDR to eQTL data to identify potentially interacting variants that regulate transcripts associated with MDD and demonstrate NPDR’s utility for GWAS and continuous outcomes.

Download Full-text

Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding

Bioinformatics ◽

10.1093/bioinformatics/btaa024 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2770-2777

Author(s):

Trang T Le ◽

Bryan A Dawkins ◽

Brett A McKinney

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Nearest Neighbor ◽

Penalized Regression ◽

Supplementary Information ◽

Covariate Adjustment ◽

Data Types ◽

Multiple Testing Correction ◽

Continuous Outcomes ◽

Projected Distance

Abstract Summary Machine learning feature selection methods are needed to detect complex interaction-network effects in complicated modeling scenarios in high-dimensional data, such as GWAS, gene expression, eQTL and structural/functional neuroimage studies for case–control or continuous outcomes. In addition, many machine learning methods have limited ability to address the issues of controlling false discoveries and adjusting for covariates. To address these challenges, we develop a new feature selection technique called Nearest-neighbor Projected-Distance Regression (NPDR) that calculates the importance of each predictor using generalized linear model regression of distances between nearest-neighbor pairs projected onto the predictor dimension. NPDR captures the underlying interaction structure of data using nearest-neighbors in high dimensions, handles both dichotomous and continuous outcomes and predictor data types, statistically corrects for covariates, and permits statistical inference and penalized regression. We use realistic simulations with interactions and other effects to show that NPDR has better precision-recall than standard Relief-based feature selection and random forest importance, with the additional benefit of covariate adjustment and multiple testing correction. Using RNA-Seq data from a study of major depressive disorder (MDD), we show that NPDR with covariate adjustment removes spurious associations due to confounding. We apply NPDR to eQTL data to identify potentially interacting variants that regulate transcripts associated with MDD and demonstrate NPDR’s utility for GWAS and continuous outcomes. Availability and implementation Available at: https://insilico.github.io/npdr/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Feature Selection Strategies and Perceptual Expertise in Configuration Search Tasks

PsycEXTRA Dataset ◽

10.1037/e520602012-523 ◽

2011 ◽

Cited By ~ 1

Author(s):

Lindsey M. Kitchell ◽

Francisco J. Parada ◽

Brandi L. Emerick ◽

Tom A. Busey

Keyword(s):

Feature Selection ◽

Perceptual Expertise ◽

Search Tasks ◽

Selection Strategies

Download Full-text

SyntcRec: a Syntactic Recommender System Based on Improved Feature Selection Technique in Large Scholarly Data

International Journal on Communications Antenna and Propagation (IRECAP) ◽

10.15866/irecap.v7i6.13353 ◽

2017 ◽

Vol 7 (6) ◽

pp. 537

Author(s):

Deepa Mandave ◽

Govind Pole

Keyword(s):

Feature Selection ◽

Recommender System ◽

Feature Selection Technique ◽

Selection Technique ◽

Scholarly Data

Download Full-text

BLIND FEATURE SELECTION AND EXTRACTION IN A 3D IMAGE CUBE

Journal of Flow Visualization and Image Processing ◽

10.1615/jflowvisimageproc.2012005150 ◽

2012 ◽

Vol 19 (2) ◽

pp. 97-111 ◽

Cited By ~ 2

Author(s):

Muhammad Ahmad ◽

Syungyoung Lee ◽

Ihsan Ul Haq ◽

Qaisar Mushtaq

Keyword(s):

Feature Selection ◽

3D Image ◽

Image Cube

Download Full-text

An Enhancement of Feature Selection Algorithm for EDM: A Review

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i5.661 ◽

2018 ◽

Vol 8 (5) ◽

pp. 29

Author(s):

Manpreet Kaur ◽

Chamkaur Singh

Keyword(s):

Feature Selection ◽

Educational Data Mining ◽

Problem Formulation ◽

Research Area ◽

Education Quality ◽

Educational Institutions ◽

Selection Algorithm ◽

Positive Role ◽

Data Set ◽

Selection Algorithms

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.

Download Full-text

Feature selection of the armature winding broken coils in synchronous motor using genetic algorithm and mahalanobis distance

Archives of Metallurgy and Materials ◽

10.2478/v10172-012-0091-7 ◽

2012 ◽

Vol 57 (3) ◽

pp. 829-835 ◽

Cited By ~ 1

Author(s):

Z. Głowacz ◽

J. Kozik

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Mahalanobis Distance ◽

Distance Measure ◽

Synchronous Motor ◽

Medical Diagnostics ◽

Motor Current ◽

Feature Spaces ◽

Multidimensional Feature Spaces ◽

Selection Of

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.

Download Full-text