NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods

AbstractData normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the current normalization methods, the different metrics yield inconsistent results. In this study, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods, achieving consistency in our evaluation results using both bulk RNA-seq and scRNA-seq data from the same library construction protocol. This consistency has validated the underlying theory that a sucessiful normalization method simultaneously maximizes the number of uniform genes and minimizes the correlation between the expression profiles of gene pairs. This consistency can also be used to analyze the quality of gene expression data. The gene expression data, normalization methods and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to evaluate methods (particularly some data-driven methods or their own methods) and then select a best one for data normalization in the gene expression analysis.

Download Full-text

Space-log: a novel approach to inferring gene-gene net-works using SPACE model with log penalty

F1000Research ◽

10.12688/f1000research.26128.2 ◽

2022 ◽

Vol 9 ◽

pp. 1159

Author(s):

Qian (Vicky) Wu ◽

Wei Sun ◽

Li Hsu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Regulatory Networks ◽

Penalized Regression ◽

R Package ◽

Expression Data ◽

Computationally Efficient ◽

P Gene ◽

Novel Approach

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog

Download Full-text

sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters

BMC Bioinformatics ◽

10.1186/s12859-017-1731-8 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 6

Author(s):

Shailesh Tripathi ◽

Jason Lloyd-Price ◽

Andre Ribeiro ◽

Olli Yli-Harja ◽

Matthias Dehmer ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Network Structure ◽

Gene Network ◽

R Package ◽

Expression Data ◽

Real Gene

Download Full-text

MADE4: an R package for multivariate analysis of gene expression data

Bioinformatics ◽

10.1093/bioinformatics/bti394 ◽

2005 ◽

Vol 21 (11) ◽

pp. 2789-2790 ◽

Cited By ~ 227

Author(s):

A. C. Culhane ◽

J. Thioulouse ◽

G. Perriere ◽

D. G. Higgins

Keyword(s):

Gene Expression ◽

Multivariate Analysis ◽

Gene Expression Data ◽

R Package ◽

Expression Data

Download Full-text

Space-log: a novel approach to inferring gene-gene net-works using SPACE model with log penalty

F1000Research ◽

10.12688/f1000research.26128.1 ◽

2020 ◽

Vol 9 ◽

pp. 1159

Author(s):

Qian (Vicky) Wu ◽

Wei Sun ◽

Li Hsu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Regulatory Networks ◽

Penalized Regression ◽

R Package ◽

Expression Data ◽

Computationally Efficient ◽

P Gene ◽

Novel Approach

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog

Download Full-text

NPA: an R package for computing network perturbation amplitudes using gene expression data and two-layer networks

BMC Bioinformatics ◽

10.1186/s12859-019-3016-x ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Florian Martin ◽

Sylvain Gubian ◽

Marja Talikka ◽

Julia Hoeng ◽

Manuel C. Peitsch

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

R Package ◽

Expression Data

Download Full-text

MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1046 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-13 ◽

Cited By ~ 12

Author(s):

Leslie Cope ◽

Xiaogang Zhong ◽

Elizabeth Garrett ◽

Giovanni Parmigiani

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Cox Regression ◽

Genomic Analysis ◽

R Package ◽

Expression Data ◽

Multiple Gene ◽

Pairwise Correlations ◽

Visualization Tools ◽

Integrative Correlation

Cross-study validation of gene expression investigations is critical in genomic analysis. We developed an R package and associated object definitions to merge and visualize multiple gene expression datasets. Our merging functions use arbitrary character IDs and generate objects that can efficiently support a variety of joint analyses. Visualization tools support exploration and cross-study validation of the data, without requiring normalization across platforms. Tools include “integrative correlation" plots that is, scatterplots of all pairwise correlations in one study against the corresponding pairwise correlations of another, both for individual genes and all genes combined. Gene-specific plots can be used to identify genes whose changes are reliably measured across studies. Visualizations also include scatterplots of gene-specific statistics quantifying relationships between expression and phenotypes of interest, using linear, logistic and Cox regression.

Download Full-text

CORM: An R Package Implementing the Clustering of Regression Models Method for Gene Clustering

Cancer Informatics ◽

10.4137/cin.s13967 ◽

2014 ◽

Vol 13s4 ◽

pp. CIN.S13967

Author(s):

Jiejun Shi ◽

Li-Xuan Qin

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Regression Models ◽

R Package ◽

Gene Clustering ◽

Expression Data

We report a new R package implementing the clustering of regression models (CORM) method for clustering genes using gene expression data and provide data examples illustrating each clustering function in the package. The CORM package is freely available at CRAN from http://cran.r-project.org .

Download Full-text

graphsim: An R package for simulating gene expression data from graph structures of biological pathways

10.1101/2020.03.02.972471 ◽

2020 ◽

Author(s):

S. Thomas Kelly ◽

Michael A. Black

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Regulatory Networks ◽

Large Scale ◽

Simulated Data ◽

R Package ◽

Biological Pathways ◽

Graph Structure ◽

Expression Data

SummaryTranscriptomic analysis is used to capture the molecular state of a cell or sample in many biological and medical applications. In addition to identifying alterations in activity at the level of individual genes, understanding changes in the gene networks that regulate fundamental biological mechanisms is also an important objective of molecular analysis. As a result, databases that describe biological pathways are increasingly uesad to assist with the interpretation of results from large-scale genomics studies. Incorporating information from biological pathways and gene regulatory networks into a genomic data analysis is a popular strategy, and there are many methods that provide this functionality for gene expression data. When developing or comparing such methods, it is important to gain an accurate assessment of their performance. Simulation-based validation studies are frequently used for this. This necessitates the use of simulated data that correctly accounts for pathway relationships and correlations. Here we present a versatile statistical framework to simulate correlated gene expression data from biological pathways, by sampling from a multivariate normal distribution derived from a graph structure. This procedure has been released as the graphsim R package on CRAN and GitHub (https://github.com/TomKellyGenetics/graphsim) and is compatible with any graph structure that can be described using the igraph package. This package allows the simulation of biological pathways from a graph structure based on a statistical model of gene expression.

Download Full-text