scholarly journals Distribution-based comprehensive evaluation of methods for differential expression analysis in metatranscriptomics

2021 ◽  
Author(s):  
Hunyong Cho ◽  
Chuwen Liu ◽  
Bridget Mengshan Lin ◽  
Boyang Tang ◽  
Jeffrey Roach ◽  
...  

Background: Measuring and understanding the function of the human microbiome is key for several aspects of health; however, the development of statistical methods specifically for the analysis of microbial gene expression (i.e., metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this knowledge gap, we undertook a comprehensive evaluation and benchmarking of eight differential analysis methods for metatranscriptomics data. Results: We used a combination of real and simulated metatranscriptomics data to evaluate the performance (i.e., model fit, Type-I error, and statistical power) of eights methods: log-normal (LN), logistic-beta (LB), MAST, Kruskal-Wallis, two-part Kruskal-Wallis, DESeq2, and ANCOM-BC and metagenomeSeq. The simulation was informed by supragingival biofilm microbiome data from about 300 preschool-age children enrolled in a study of early childhood caries (ECC), whereas validations were sought in two additional datasets, including an ECC and an inflammatory bowel disease one. The LB test showed the highest power in both small and large sample sizes and reasonably controlled Type-I error. Contrarily, MAST was hampered by inflated Type-I error. Using LN and LB tests, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with ECC. Conclusion: This comprehensive model evaluation findings offer practical guidance for the selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics data. Selection of an optimal method is likely to increase the possibility of detecting true signals while minimizing the chance of claiming false ones.

2021 ◽  
Author(s):  
Zihan Cui ◽  
Yuhang Liu ◽  
Jinfeng Zhang ◽  
Xing Qiu

AbstractBackgroundWe developed super-delta2, a differential gene expression analysis pipeline designed for multi-group comparisons for RNA-seq data. It includes a customized one-way ANOVA F-test and a post-hoc test for pairwise group comparisons; both are designed to work with a multivariate normalization procedure to reduce technical noise. It also includes a trimming procedure with bias-correction to obtain robust and approximately unbiased summary statistics used in these tests. We demonstrated the asymptotic applicability of super-delta2 to log-transformed read counts in RNA-seq data by large sample theory based on Negative Binomial Poisson (NBP) distribution.ResultsWe compared super-delta2 with three commonly used RNA-seq data analysis methods: limma/voom, edgeR, and DESeq2 using both simulated and real datasets. In all three simulation settings, super-delta2 not only achieved the best overall statistical power, but also was the only method that controlled type I error at the nominal level. When applied to a breast cancer dataset to identify differential expression pattern associated with multiple pathologic stages, super-delta2 selected more enriched pathways than other methods, which are directly linked to the underlying biological condition (breast cancer).ConclusionsBy incorporating trimming and bias-correction in the normalization step, super-delta2 was able to achieve tight control of type I error. Because the hypothesis tests are based on asymptotic normal approximation of the NBP distribution, super-delta2 does not require computationally expensive iterative optimization procedures used by methods such as edgeR and DESeq2, which occasionally have convergence issues.


Author(s):  
Sagar Utturkar ◽  
Asela Dassanayake ◽  
Shilpa Nagaraju ◽  
Steven D. Brown

2020 ◽  
Vol 36 (9) ◽  
pp. 2657-2664 ◽  
Author(s):  
Md Amanullah ◽  
Mengqian Yu ◽  
Xiwei Sun ◽  
Aoran Luo ◽  
Qing Zhou ◽  
...  

Abstract Motivation miRNA isoforms (isomiRs) are produced from the same arm as the archetype miRNA with a few nucleotides different at 5 and/or 3 termini. These well-conserved isomiRs are functionally important and have contributed to the evolution of miRNA genes. Accurate detection of differential expression of miRNAs can bring new insights into the cellular function of miRNA and a further improvement in miRNA-based diagnostic and prognostic applications. However, very few methods take isomiR variations into account in the analysis of miRNA differential expression. Results To overcome this challenge, we developed a novel approach to take advantage of the multidimensional structure of isomiR data from the same miRNAs, termed as a multivariate differential expression by Hotelling’s T2 test (MDEHT). The utilization of the information hidden in isomiRs enables MDEHT to increase the power of identifying differentially expressed miRNAs that are not marginally detectable in univariate testing methods. We conducted rigorous and unbiased comparisons of MDEHT with seven commonly used tools in simulated and real datasets from The Cancer Genome Atlas. Our comprehensive evaluations demonstrated that the MDEHT method was robust among various datasets and outperformed other commonly used tools in terms of Type I error rate, true positive rate and reproducibility. Availability and implementation The source code for identifying and quantifying isomiRs and performing miRNA differential expression analysis is available at https://github.com/amanzju/MDEHT. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Shiyi Liu ◽  
Zitao Wang ◽  
Ronghui Zhu ◽  
Feiyan Wang ◽  
Yanxiang Cheng ◽  
...  

2021 ◽  
Author(s):  
Lis Arend ◽  
Judith Bernett ◽  
Quirin Manz ◽  
Melissa Klug ◽  
Olga Lazareva ◽  
...  

Cytometry techniques are widely used to discover cellular characteristics at single-cell resolution. Many data analysis methods for cytometry data focus solely on identifying subpopulations via clustering and testing for differential cell abundance. For differential expression analysis of markers between conditions, only few tools exist. These tools either reduce the data distribution to medians, discarding valuable information, or have underlying assumptions that may not hold for all expression patterns. Here, we systematically evaluated existing and novel approaches for differential expression analysis on real and simulated CyTOF data. We found that methods using median marker expressions compute fast and reliable results when the data is not strongly zero-inflated. Methods using all data detect changes in strongly zero-inflated markers, but partially suffer from overprediction or cannot handle big datasets. We present a new method, CyEMD, based on calculating the Earth Mover's Distance between expression distributions that can handle strong zero-inflation without being too sensitive. Additionally, we developed CYANUS, a user-friendly R Shiny App allowing the user to analyze cytometry data with state-of-the-art tools, including well-performing methods from our comparison. A public web interface is available at https://exbio.wzw.tum.de/cyanus/.


2021 ◽  
Author(s):  
Haiyang Jin

Analysis of variance (ANOVA) is one of the most popular statistical methods employed for data analysis in psychology and other fields. Nevertheless, ANOVA is frequently used as an exploratory approach, even in confirmatory studies with explicit hypotheses. Such misapplication may invalidate ANOVA conventions, resulting in reduced statistical power, and even threatening the validity of conclusions. This paper evaluates the appropriateness of ANOVA conventions, discusses the potential motivations possibly misunderstood by researchers, and provides practical suggestions. Moreover, this paper proposes to control the Type I error rate with Hypothesis-based Type I Error Rate to consider both the number of tests and their logical relationships in rejecting the null hypothesis. Furthermore, this paper introduces the simple interaction analysis, which can employ the most straightforward interaction to test a hypothesis of interest. Finally, pre-registration is recommended to provide clarity for the selection of appropriate ANOVA tests in both confirmatory and exploratory studies.


Sign in / Sign up

Export Citation Format

Share Document