gene set testing
Recently Published Documents


TOTAL DOCUMENTS

20
(FIVE YEARS 7)

H-INDEX

7
(FIVE YEARS 1)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


2021 ◽  
Author(s):  
Eric Van Buren ◽  
Ming Hu ◽  
Liang Cheng ◽  
John Wrobel ◽  
Kirk Wilhelmsen ◽  
...  

AbstractWe propose TWO-SIGMA-G, a competitive gene set test designed for scRNA-seq data. TWO-SIGMA-G uses the mixed-effects regression modelling approach of our previously published TWO-SIGMA to test for differential expression at the gene-level. This regression-based approach can analyze complex designs while accommodating zero-inflated and overdispersed counts and within-sample cell-cell correlation. TWO-SIGMA-G uses a novel approach to adjust for inter-gene-correlation (IGC) at the set-level, which can inflate type-I error when ignored. Simulations demonstrate that TWO-SIGMA-G preserves type-I error and increases power in the presence of IGC compared to other methods designed for bulk and single-cell RNA-seq data. Application to two real datasets of HIV infection in mice and Alzheimer’s disease progression in humans reveal biologically meaningful results. TWO-SIGMA-G is available at https://github.com/edvanburen/twosigma.


2020 ◽  
Vol 8 (6) ◽  
pp. 549-564
Author(s):  
Min Xiao ◽  
Ting Chen ◽  
Kunpeng Huang ◽  
Ruixing Ming

Abstract Detecting differential expression of genes in genom research (e.g., 2019-nCoV) is not uncommon, due to the cost only small sample is employed to estimate a large number of variances (or their inverse) of variables simultaneously. However, the commonly used approaches perform unreliable. Borrowing information across different variables or priori information of variables, shrinkage estimation approaches are proposed and some optimal shrinkage estimators are obtained in the sense of asymptotic. In this paper, we focus on the setting of small sample and a likelihood-unbiased estimator for power of variances is given under the assumption that the variances are chi-squared distribution. Simulation reports show that the likelihood-unbiased estimators for variances and their inverse perform very well. In addition, application comparison and real data analysis indicate that the proposed estimator also works well.


Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalisation and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


2020 ◽  
Vol 48 (16) ◽  
pp. e94-e94 ◽  
Author(s):  
Hildreth Robert Frost

Abstract Statistical analysis of single cell RNA-sequencing (scRNA-seq) data is hindered by high levels of technical noise and inflated zero counts. One promising approach for addressing these challenges is gene set testing, or pathway analysis, which can mitigate sparsity and noise, and improve interpretation and power, by aggregating expression data to the pathway level. Unfortunately, methods optimized for bulk transcriptomics perform poorly on scRNA-seq data and progress on single cell-specific techniques has been limited. Importantly, no existing methods support cell-level gene set inference. To address this challenge, we developed a new gene set testing method, Variance-adjusted Mahalanobis (VAM), that integrates with the Seurat framework and can accommodate the technical noise, sparsity and large sample sizes characteristic of scRNA-seq data. The VAM method computes cell-specific pathway scores to transform a cell-by-gene matrix into a cell-by-pathway matrix that can be used for both data visualization and statistical enrichment analysis. Because the distribution of these scores under the null of uncorrelated technical noise has an accurate gamma approximation, both population and cell-level inference is supported. As demonstrated using simulated and real scRNA-seq data, the VAM method provides superior classification accuracy at a lower computation cost relative to existing single sample gene set testing approaches.


2020 ◽  
Author(s):  
H. Robert Frost

AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful tool for analyzing complex tissues with recent advances enabling the transcriptomic profiling of thousands to tens-of-thousands of individual cells. Although scRNA-seq provides unprecedented insights into the biology of heterogeneous cell populations, analyzing such data on a gene-by-gene basis is challenging due to the large number of tested hypotheses, high level of technical noise and inflated zero counts. One promising approach for addressing these challenges is gene set testing, or pathway analysis. By combining the expression data for all genes in a pathway, gene set testing can mitigate the impacts of sparsity and noise and improve interpretation, replication and statistical power. Unfortunately, statistical and biological differences between single cell and bulk expression measurements make it challenging to use gene set testing methods originally developed for bulk tissue on scRNA-seq data and progress on single cell-specific methods has been limited. To address this challenge, we have developed a new gene set testing method, variance-adjusted Mahalanobis (VAM), that seamlessly integrates with the Seurat framework and is designed to accommodate the technical noise, sparsity and large sample sizes characteristic of scRNA-seq data. The VAM method computes cell-specific pathway scores to transform a cell-by-gene matrix into a cell-by-pathway matrix that can be used for both exploratory data visualization and statistical gene set enrichment analysis. Because the distribution of these scores under the null of uncorrelated technical noise has an accurate gamma approximation, inference can be performed at both the population and single cell levels. As we demonstrate using both simulation studies and real data analyses, the VAM method provides superior classification accuracy at a lower computation cost relative to existing single sample gene set testing approaches.


2019 ◽  
Vol 21 (4) ◽  
pp. 1302-1312 ◽  
Author(s):  
Mitra Ebrahimpoor ◽  
Pietro Spitali ◽  
Kristina Hettne ◽  
Roula Tsonaka ◽  
Jelle Goeman

Abstract Studying sets of genomic features is increasingly popular in genomics, proteomics and metabolomics since analyzing at set level not only creates a natural connection to biological knowledge but also offers more statistical power. Currently, there are two gene-set testing approaches, self-contained and competitive, both of which have their advantages and disadvantages, but neither offers the final solution. We introduce simultaneous enrichment analysis (SEA), a new approach for analysis of feature sets in genomics and other omics based on a new unified null hypothesis, which includes the self-contained and competitive null hypotheses as special cases. We employ closed testing using Simes tests to test this new hypothesis. For every feature set, the proportion of active features is estimated, and a confidence bound is provided. Also, for every unified null hypotheses, a $P$-value is calculated, which is adjusted for family-wise error rate. SEA does not need to assume that the features are independent. Moreover, users are allowed to choose the feature set(s) of interest after observing the data. We develop a novel pipeline and apply it on RNA-seq data of dystrophin-deficient mdx mice, showcasing the flexibility of the method. Finally, the power properties of the method are evaluated through simulation studies.


F1000Research ◽  
2018 ◽  
Vol 5 ◽  
pp. 1408 ◽  
Author(s):  
Charity W. Law ◽  
Monther Alhamdoosh ◽  
Shian Su ◽  
Xueyi Dong ◽  
Luyi Tian ◽  
...  

The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2010 ◽  
Author(s):  
Monther Alhamdoosh ◽  
Charity W. Law ◽  
Luyi Tian ◽  
Julie M. Sheridan ◽  
Milica Ng ◽  
...  

Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.


Sign in / Sign up

Export Citation Format

Share Document