scholarly journals Co-expression analysis is biased by a mean-correlation relationship

2020 ◽  
Author(s):  
Yi Wang ◽  
Stephanie C. Hicks ◽  
Kasper D. Hansen

AbstractEstimates of correlation between pairs of genes in co-expression analysis are commonly used to construct networks among genes using gene expression data. Here, we show that the distribution of such correlations depend on the expression level of the involved genes, which we refer to this as a mean-correlation relationship in RNA-seq data, both bulk and single-cell. This dependence introduces a bias in co-expression analysis whereby highly expressed genes are more likely to be highly correlated. Such a relationship is not observed in protein-protein interaction data, suggesting that it is not reflecting biology. Ignoring this bias can lead to missing potentially biologically relevant pairs of genes that are lowly expressed, such as transcription factors. To address this problem, we introduce spatial quantile normalization (SpQN), a method for normalizing local distributions in a correlation matrix. We show that spatial quantile normalization removes the mean-correlation relationship and corrects the expression bias in network reconstruction.

2019 ◽  
Author(s):  
Adam H. Freedman ◽  
John M. Gaspar ◽  
Timothy B. Sackton

ABSTRACTBackgroundTypical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases.ResultsAt both the transcript and gene levels, 2×40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2×125 than 1×75 reads; in nearly all cases, those correlations are also greater than for 1×125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2×40 consistently outperform those using 1×75.ConclusionResearchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.


2019 ◽  
Vol 12 (1) ◽  
pp. 5-10 ◽  
Author(s):  
Sivagnanam Rajamanickam Mani Sekhar ◽  
Siddesh Gaddadevara Matt ◽  
Sunilkumar S. Manvi ◽  
Srinivasa Krishnarajanagar Gopalalyengar

Background: Essential proteins are significant for drug design, cell development, and for living organism survival. A different method has been developed to predict essential proteins by using topological feature, and biological features. Objective: Still it is a challenging task to predict essential proteins effectively and timely, as the availability of protein protein interaction data depends on network correctness. Methods: In the proposed solution, two approaches Mean Weighted Average and Recursive Feature Elimination is been used to predict essential proteins and compared to select the best one. In Mean Weighted Average consecutive slot data to be taken into aggregated count, to get the nearest value which considered as prescription for the best proteins for the slot, where as in Recursive Feature Elimination method whole data is spilt into different slots and essential protein for each slot is determined. Results: The result shows that the accuracy using Recursive Feature Elimination is at-least nine percentages superior when compared to Mean Weighted Average and Betweenness centrality. Conclusion: Essential proteins are made of genes which are essential for living being survival and drug design. Different approaches have been proposed to anticipate essential proteins using either experimental or computation methods. The experimental result show that the proposed work performs better than other approaches.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Zeeshan Ahmed ◽  
Eduard Gibert Renart ◽  
Saman Zeeshan ◽  
XinQi Dong

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Albert T. Young ◽  
Xavier Carette ◽  
Michaela Helmel ◽  
Hanno Steen ◽  
Robert N. Husson ◽  
...  

AbstractThe ability of Mycobacterium tuberculosis (Mtb) to adapt to diverse stresses in its host environment is crucial for pathogenesis. Two essential Mtb serine/threonine protein kinases, PknA and PknB, regulate cell growth in response to environmental stimuli, but little is known about their downstream effects. By combining RNA-Seq data, following treatment with either an inhibitor of both PknA and PknB or an inactive control, with publicly available ChIP-Seq and protein–protein interaction data for transcription factors, we show that the Mtb transcription factor (TF) regulatory network propagates the effects of kinase inhibition and leads to widespread changes in regulatory programs involved in cell wall integrity, stress response, and energy production, among others. We also observe that changes in TF regulatory activity correlate with kinase-specific phosphorylation of those TFs. In addition to characterizing the downstream regulatory effects of PknA/PknB inhibition, this demonstrates the need for regulatory network approaches that can incorporate signal-driven transcription factor modifications.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Matthew Chung ◽  
Vincent M. Bruno ◽  
David A. Rasko ◽  
Christina A. Cuomo ◽  
José F. Muñoz ◽  
...  

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.


Plants ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 1465
Author(s):  
Ramon de Koning ◽  
Raphaël Kiekens ◽  
Mary Esther Muyoka Toili ◽  
Geert Angenon

Raffinose family oligosaccharides (RFO) play an important role in plants but are also considered to be antinutritional factors. A profound understanding of the galactinol and RFO biosynthetic gene families and the expression patterns of the individual genes is a prerequisite for the sustainable reduction of the RFO content in the seeds, without compromising normal plant development and functioning. In this paper, an overview of the annotation and genetic structure of all galactinol- and RFO biosynthesis genes is given for soybean and common bean. In common bean, three galactinol synthase genes, two raffinose synthase genes and one stachyose synthase gene were identified for the first time. To discover the expression patterns of these genes in different tissues, two expression atlases have been created through re-analysis of publicly available RNA-seq data. De novo expression analysis through an RNA-seq study during seed development of three varieties of common bean gave more insight into the expression patterns of these genes during the seed development. The results of the expression analysis suggest that different classes of galactinol- and RFO synthase genes have tissue-specific expression patterns in soybean and common bean. With the obtained knowledge, important galactinol- and RFO synthase genes that specifically play a key role in the accumulation of RFOs in the seeds are identified. These candidate genes may play a pivotal role in reducing the RFO content in the seeds of important legumes which could improve the nutritional quality of these beans and would solve the discomforts associated with their consumption.


2021 ◽  
Vol 22 (13) ◽  
pp. 6673
Author(s):  
Xiaochao Qu ◽  
Mei Liao ◽  
Weiwei Liu ◽  
Yisheng Cai ◽  
Qiaorong Yi ◽  
...  

Wingless-type MMTV integration site family, member 16 (wnt16), is a wnt ligand that participates in the regulation of vertebrate skeletal development. Studies have shown that wnt16 can regulate bone metabolism, but its molecular mechanism remains largely undefined. We obtained the wnt16-/- zebrafish model using the CRISPR-Cas9-mediated gene knockout screen with 11 bp deletion in wnt16, which led to the premature termination of amino acid translation and significantly reduced wnt16 expression, thus obtaining the wnt16-/- zebrafish model. The expression of wnt16 in bone-related parts was detected via in situ hybridization. The head, spine, and tail exhibited significant deformities, and the bone mineral density and trabecular bone decreased in wnt16-/- using light microscopy and micro-CT analysis. RNA sequencing was performed to explore the differentially expressed genes (DEGs). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis found that the down-regulated DEGs are mainly concentrated in mTOR, FoxO, and VEGF pathways. Protein–protein interaction (PPI) network analysis was performed with the detected DEGs. Eight down-regulated DEGs including akt1, bnip4, ptena, vegfaa, twsg1b, prkab1a, prkab1b, and pla2g4f.2 were validated by qRT-PCR and the results were consistent with the RNA-seq data. Overall, our work provides key insights into the influence of wnt16 gene on skeletal development.


Sign in / Sign up

Export Citation Format

Share Document