AFS-DEA: An automatic feature selection platform for differential expression analysis

A novel feature selection for RNA-seq analysis

10.1101/209841 ◽

2017 ◽

Author(s):

Henry Han

Keyword(s):

Feature Selection ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Feature Selection Method ◽

Selection Method ◽

Singular Value ◽

Data Driven ◽

Rna Seq ◽

Selection For

AbstractRNA-seq data are challenging existing omics data analytics for its volume and complexity. Although quite a few computational models were proposed from different standing points to conduct differential expression (D.E.) analysis, almost all these methods do not provide a rigorous feature selection for high-dimensional RNA-seq count data. Instead, most or even all genes are invited into differential calls no matter they have real contributions to data variations or not. Thus, it would inevitably affect the robustness of D.E. analysis and lead to the increase of false positive ratios.In this study, we presented a novel feature selection method: nonnegative singular value approximation (NSVA) to enhance RNA-seq differential expression analysis by taking advantage of RNA-seq count data’s non-negativity. As a variance-based feature selection method, it selects genes according to its contribution to the first singular value direction of input data in a data-driven approach. It demonstrates robustness to depth bias and gene length bias in feature selection in comparison with its five peer methods. Combining with state-of-the-art RNA-seq differential expression analysis, it contributes to enhancing differential expression analysis by lowering false discovery rates caused by the biases. Furthermore, we demonstrated the effectiveness of the proposed feature selection by proposing a data-driven differential expression analysis: NSVA-seq, besides conducting network marker discovery.

Download Full-text

GEOlimma: Differential Expression Analysis and Feature Selection Using Pre-Existing Microarray Data

10.1101/693564 ◽

2019 ◽

Author(s):

Liangqun Lu ◽

Kevin A. Townsend ◽

Bernie J. Daigle

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Feature Selection Method ◽

Classification Performance ◽

Prior Probabilities ◽

Transcriptomics Data ◽

Differential Gene

AbstractBackgroundDifferential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes.ResultsIn this study, we propose a novel differential expression and feature selection method—GEOlimma—which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset.ConclusionsOur results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets.

Download Full-text

GEOlimma: differential expression analysis and feature selection using pre-existing microarray data

BMC Bioinformatics ◽

10.1186/s12859-020-03932-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Liangqun Lu ◽

Kevin A. Townsend ◽

Bernie J. Daigle

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Feature Selection Method ◽

Classification Performance ◽

Prior Probabilities ◽

Transcriptomics Data ◽

Differential Gene

Abstract Background Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes. Results In this study, we propose a novel differential expression and feature selection method—GEOlimma—which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset. Conclusions Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets.

Download Full-text

ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles

BMC Bioinformatics ◽

10.1186/s12859-020-3388-y ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 16

Author(s):

Xudong Zhao ◽

Qing Jiao ◽

Hangyu Li ◽

Yiming Wu ◽

Hanxu Wang ◽

...

Keyword(s):

Feature Selection ◽

Differential Expression ◽

Expression Analysis ◽

Expression Profiles ◽

Differential Expression Analysis ◽

Ensemble Classifier ◽

Selection For

Download Full-text

Faculty Opinions recommendation of Differential expression analysis for sequence count data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.6932959.7123057 ◽

2010 ◽

Author(s):

Sarah Teichmann ◽

Daniel Hebenstreit

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Count Data ◽

Differential Expression Analysis

Download Full-text

Best practices on the differential expression analysis of multi-species RNA-seq

Genome Biology ◽

10.1186/s13059-021-02337-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew Chung ◽

Vincent M. Bruno ◽

David A. Rasko ◽

Christina A. Cuomo ◽

José F. Muñoz ◽

...

Keyword(s):

Best Practices ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Species ◽

Rna Seq ◽

Species Analysis ◽

Differential Gene ◽

Multiple Species ◽

Downstream Analysis

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.

Download Full-text