scholarly journals Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Kayla A. Johnson ◽  
Arjun Krishnan

Abstract Background Constructing gene coexpression networks is a powerful approach for analyzing high-throughput gene expression data towards module identification, gene function prediction, and disease-gene prioritization. While optimal workflows for constructing coexpression networks, including good choices for data pre-processing, normalization, and network transformation, have been developed for microarray-based expression data, such well-tested choices do not exist for RNA-seq data. Almost all studies that compare data processing and normalization methods for RNA-seq focus on the end goal of determining differential gene expression. Results Here, we present a comprehensive benchmarking and analysis of 36 different workflows, each with a unique set of normalization and network transformation methods, for constructing coexpression networks from RNA-seq datasets. We test these workflows on both large, homogenous datasets and small, heterogeneous datasets from various labs. We analyze the workflows in terms of aggregate performance, individual method choices, and the impact of multiple dataset experimental factors. Our results demonstrate that between-sample normalization has the biggest impact, with counts adjusted by size factors producing networks that most accurately recapitulate known tissue-naive and tissue-aware gene functional relationships. Conclusions Based on this work, we provide concrete recommendations on robust procedures for building an accurate coexpression network from an RNA-seq dataset. In addition, researchers can examine all the results in great detail at https://krishnanlab.github.io/RNAseq_coexpression to make appropriate choices for coexpression analysis based on the experimental factors of their RNA-seq dataset.

2020 ◽  
Author(s):  
Kayla A Johnson ◽  
Arjun Krishnan

AbstractBackgroundConstructing gene coexpression networks is a powerful approach for analyzing high-throughput gene expression data towards module identification, gene function prediction, and disease-gene prioritization. While optimal workflows for constructing coexpression networks – including good choices for data pre-processing, normalization, and network transformation – have been developed for microarray-based expression data, such well-tested choices do not exist for RNA-seq data. Almost all studies that compare data processing/normalization methods for RNA-seq focus on the end goal of determining differential gene expression.ResultsHere, we present a comprehensive benchmarking and analysis of 30 different workflows, each with a unique set of normalization and network transformation methods, for constructing coexpression networks from RNA-seq datasets. We tested these workflows on both large, homogenous datasets (Genotype-Tissue Expression project) and small, heterogeneous datasets from various labs (submitted to the Sequence Read Archive). We analyzed the workflows in terms of aggregate performance, individual method choices, and the impact of multiple dataset experimental factors. Our results demonstrate that between-sample normalization has the biggest impact, with trimmed mean of M-values or upper quartile normalization producing networks that most accurately recapitulate known tissue-naive and tissue-specific gene functional relationships.ConclusionsBased on this work, we provide concrete recommendations on robust procedures for building an accurate coexpression network from an RNA-seq dataset. In addition, researchers can examine all the results in great detail at https://krishnanlab.github.io/norm_for_RNAseq_coexp to make appropriate choices for coexpression analysis based on the experimental factors of their RNA-seq dataset.


2020 ◽  
Author(s):  
Haijing Jin ◽  
Zhandong Liu

AbstractDeconvolution analyses have been widely used to track compositional alternations of cell-types in gene expression data. Even though numerous novel methods have been developed in recent years, researchers are still having difficulty selecting optimal deconvolution methods due to the lack of comprehensive benchmarks relative to the newly developed methods. To systematically reveal the pitfalls and challenges of deconvolution analyses, we studied the impact of several technical and biological factors such as simulation model, quantification unit, component number, weight matrix, and unknown content by constructing three benchmarking frameworks that cover comparative analysis of 11 popular deconvolution methods under 1,766 conditions. We hope this study can provide new insights to researchers for future application, standardization, and development of deconvolution tools on RNA-seq data.


Author(s):  
Fabricio Almeida-Silva ◽  
Kanhu C Moharana ◽  
Thiago M Venancio

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Li Tong ◽  
◽  
Po-Yen Wu ◽  
John H. Phan ◽  
Hamid R. Hassazadeh ◽  
...  

Abstract To use next-generation sequencing technology such as RNA-seq for medical and health applications, choosing proper analysis methods for biomarker identification remains a critical challenge for most users. The US Food and Drug Administration (FDA) has led the Sequencing Quality Control (SEQC) project to conduct a comprehensive investigation of 278 representative RNA-seq data analysis pipelines consisting of 13 sequence mapping, three quantification, and seven normalization methods. In this article, we focused on the impact of the joint effects of RNA-seq pipelines on gene expression estimation as well as the downstream prediction of disease outcomes. First, we developed and applied three metrics (i.e., accuracy, precision, and reliability) to quantitatively evaluate each pipeline’s performance on gene expression estimation. We then investigated the correlation between the proposed metrics and the downstream prediction performance using two real-world cancer datasets (i.e., SEQC neuroblastoma dataset and the NIH/NCI TCGA lung adenocarcinoma dataset). We found that RNA-seq pipeline components jointly and significantly impacted the accuracy of gene expression estimation, and its impact was extended to the downstream prediction of these cancer outcomes. Specifically, RNA-seq pipelines that produced more accurate, precise, and reliable gene expression estimation tended to perform better in the prediction of disease outcome. In the end, we provided scenarios as guidelines for users to use these three metrics to select sensible RNA-seq pipelines for the improved accuracy, precision, and reliability of gene expression estimation, which lead to the improved downstream gene expression-based prediction of disease outcome.


2020 ◽  
Author(s):  
Colin Peter Singer Kruse ◽  
Alexander D Meyers ◽  
Proma Basu ◽  
Sarahann Hutchinson ◽  
Darron R Luesse ◽  
...  

Abstract Background: Understanding of gravity sensing and response is critical to long-term human habitation in space and can provide new advantages for terrestrial agriculture. To this end, the altered gene expression profile induced by microgravity has been repeatedly queried by microarray and RNA-seq experiments to understand gravitropism. However, the quantification of altered protein abundance in space has been minimally investigated. Results: Proteomic (iTRAQ-labelled LC-MS/MS) and transcriptomic (RNA-seq) analyses simultaneously quantified protein and transcript differential expression of three-day old, etiolated Arabidopsis thaliana seedlings grown aboard the International Space Station along with their ground control counterparts. Protein extracts were fractionated to isolate soluble and membrane proteins and analyzed to detect differentially phosphorylated peptides. In total, 968 RNAs, 107 soluble proteins, and 103 membrane proteins were identified as differentially expressed. In addition, the proteomic analyses identified 16 differential phosphorylation events. Proteomic data delivered novel insights and simultaneously provided new context to previously made observations of gene expression in microgravity. There is a sweeping shift in post-transcriptional mechanisms of gene regulation including RNA-decapping protein DCP5, the splicing factors GRP7 and GRP8, and AGO4,. These data also indicate AHA2 and FERONIA as well as CESA1 and SHOU4 as central to the cell wall adaptations seen in spaceflight. Patterns of tubulin-a 1, 3,4 and 6 phosphorylation further reveal an interaction of microtubule and redox homeostasis that mirrors osmotic response signaling elements. The absence of gravity also results in a seemingly wasteful dysregulation of plastid gene transcription. Conclusions: The datasets gathered from Arabidopsis seedlings exposed to microgravity revealed marked impacts on post-transcriptional regulation, cell wall synthesis, redox/microtubule dynamics, and plastid gene transcription. The impact of post-transcriptional regulatory alterations represents an unstudied element of the plant microgravity response with the potential to significantly impact plant growth efficiency and beyond. What’s more, addressing the effects of microgravity on AHA2, CESA1, and alpha tubulins has the potential to enhance cytoskeletal organization and cell wall composition, thereby enhancing biomass production and growth in microgravity. Finally, understanding and manipulating the dysregulation of plastid gene transcription has further potential to address the goal of enhancing plant growth in the stressful conditions of microgravity.


2019 ◽  
Vol 36 (7) ◽  
pp. 1373-1383 ◽  
Author(s):  
Longjun Wu ◽  
Kailey E Ferger ◽  
J David Lambert

Abstract It has been proposed that animals have a pattern of developmental evolution resembling an hourglass because the most conserved development stage—often called the phylotypic stage—is always in midembryonic development. Although the topic has been debated for decades, recent studies using molecular data such as RNA-seq gene expression data sets have largely supported the existence of periods of relative evolutionary conservation in middevelopment, consistent with the phylotypic stage and the hourglass concepts. However, so far this approach has only been applied to a limited number of taxa across the tree of life. Here, using established phylotranscriptomic approaches, we found a surprising reverse hourglass pattern in two molluscs and a polychaete annelid, representatives of the Spiralia, an understudied group that contains a large fraction of metazoan body plan diversity. These results suggest that spiralians have a divergent midembryonic stage, with more conserved early and late development, which is the inverse of the pattern seen in almost all other organisms where these phylotranscriptomic approaches have been reported. We discuss our findings in light of proposed reasons for the phylotypic stage and hourglass model in other systems.


2019 ◽  
Vol 15 (2) ◽  
pp. e1006792 ◽  
Author(s):  
Brandon Monier ◽  
Adam McDermaid ◽  
Cankun Wang ◽  
Jing Zhao ◽  
Allison Miller ◽  
...  

Author(s):  
D Fumagalli ◽  
B Haibe-Kains ◽  
S Michiels ◽  
DN Brown ◽  
D Gacquer ◽  
...  

2019 ◽  
Vol 2019 ◽  
pp. 1-12
Author(s):  
Shan Lin ◽  
Zhicheng Zou ◽  
Cuibing Zhou ◽  
Hancheng Zhang ◽  
Zhiming Cai

Caterpillar fungus is a well-known fungal Chinese medicine. To reveal molecular changes during early and late stages of adenosine biosynthesis, transcriptome analysis was performed with the anamorph strain of caterpillar fungus. A total of 2,764 differentially expressed genes (DEGs) were identified (p≤0.05, |log2 Ratio| ≥ 1), of which 1,737 were up-regulated and 1,027 were down-regulated. Gene expression profiling on 4–10 d revealed a distinct shift in expression of the purine metabolism pathway. Differential expression of 17 selected DEGs which involved in purine metabolism (map00230) were validated by qPCR, and the expression trends were consistent with the RNA-Seq results. Subsequently, the predicted adenosine biosynthesis pathway combined with qPCR and gene expression data of RNA-Seq indicated that the increased adenosine accumulation is a result of down-regulation of ndk, ADK, and APRT genes combined with up-regulation of AK gene. This study will be valuable for understanding the molecular mechanisms of the adenosine biosynthesis in caterpillar fungus.


Sign in / Sign up

Export Citation Format

Share Document