Co-expression analysis is biased by a mean-correlation relationship

AbstractEstimates of correlation between pairs of genes in co-expression analysis are commonly used to construct networks among genes using gene expression data. Here, we show that the distribution of such correlations depend on the expression level of the involved genes, which we refer to this as a mean-correlation relationship in RNA-seq data, both bulk and single-cell. This dependence introduces a bias in co-expression analysis whereby highly expressed genes are more likely to be highly correlated. Such a relationship is not observed in protein-protein interaction data, suggesting that it is not reflecting biology. Ignoring this bias can lead to missing potentially biologically relevant pairs of genes that are lowly expressed, such as transcription factors. To address this problem, we introduce spatial quantile normalization (SpQN), a method for normalizing local distributions in a correlation matrix. We show that spatial quantile normalization removes the mean-correlation relationship and corrects the expression bias in network reconstruction.

Download Full-text

Short paired-end reads trump long single-end reads for expression analysis

10.1101/777409 ◽

2019 ◽

Author(s):

Adam H. Freedman ◽

John M. Gaspar ◽

Timothy B. Sackton

Keyword(s):

Experimental Design ◽

Differential Expression ◽

Expression Analysis ◽

Cost Effective ◽

Rna Seq ◽

Cost Effective Approach ◽

Effective Manner ◽

Gene Level ◽

Highly Correlated ◽

Paired End Sequencing

ABSTRACTBackgroundTypical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases.ResultsAt both the transcript and gene levels, 2×40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2×125 than 1×75 reads; in nearly all cases, those correlations are also greater than for 1×125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2×40 consistently outperform those using 1×75.ConclusionResearchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.

Download Full-text

Faculty Opinions recommendation of Analyzing yeast protein-protein interaction data obtained from different sources.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1010262.240599 ◽

2004 ◽

Author(s):

Golan Yona

Keyword(s):

Protein Interaction ◽

Protein Interaction Data ◽

Interaction Data ◽

Yeast Protein ◽

Protein Protein Interaction ◽

Different Sources

Download Full-text

Faculty Opinions recommendation of Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.14267340.15779565 ◽

2012 ◽

Author(s):

Marylyn Ritchie ◽

Stephen Turner

Keyword(s):

Expression Analysis ◽

Transcript Expression ◽

Rna Seq ◽

Differential Gene

Download Full-text

Identification of Essential Proteins in Yeast Using Mean Weighted Average and Recursive Feature Elimination

Recent Patents on Computer Science ◽

10.2174/2213275911666180918155521 ◽

2019 ◽

Vol 12 (1) ◽

pp. 5-10 ◽

Cited By ~ 5

Author(s):

Sivagnanam Rajamanickam Mani Sekhar ◽

Siddesh Gaddadevara Matt ◽

Sunilkumar S. Manvi ◽

Srinivasa Krishnarajanagar Gopalalyengar

Keyword(s):

Drug Design ◽

Weighted Average ◽

Living Organism ◽

Experimental Result ◽

Recursive Feature Elimination ◽

Protein Interaction Data ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Result Show ◽

Better Than

Background: Essential proteins are significant for drug design, cell development, and for living organism survival. A different method has been developed to predict essential proteins by using topological feature, and biological features. Objective: Still it is a challenging task to predict essential proteins effectively and timely, as the availability of protein protein interaction data depends on network correctness. Methods: In the proposed solution, two approaches Mean Weighted Average and Recursive Feature Elimination is been used to predict essential proteins and compared to select the best one. In Mean Weighted Average consecutive slot data to be taken into aggregated count, to get the nearest value which considered as prescription for the best proteins for the slot, where as in Recursive Feature Elimination method whole data is spilt into different slots and essential protein for each slot is determined. Results: The result shows that the accuracy using Recursive Feature Elimination is at-least nine percentages superior when compared to Mean Weighted Average and Betweenness centrality. Conclusion: Essential proteins are made of genes which are essential for living being survival and drug design. Different approaches have been proposed to anticipate essential proteins using either experimental or computation methods. The experimental result show that the proposed work performs better than other approaches.

Download Full-text

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Human Genomics ◽

10.1186/s40246-021-00336-1 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Zeeshan Ahmed ◽

Eduard Gibert Renart ◽

Saman Zeeshan ◽

XinQi Dong

Keyword(s):

Data Analysis ◽

Patient Care ◽

Expression Analysis ◽

High Throughput ◽

Gene Annotation ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Sequencing Data ◽

Complex Disorders ◽

Transcriptomics Data

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.

Download Full-text

Multi-omic regulatory networks capture downstream effects of kinase inhibition in Mycobacterium tuberculosis

npj Systems Biology and Applications ◽

10.1038/s41540-020-00164-4 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Albert T. Young ◽

Xavier Carette ◽

Michaela Helmel ◽

Hanno Steen ◽

Robert N. Husson ◽

...

Keyword(s):

Transcription Factor ◽

Mycobacterium Tuberculosis ◽

Regulatory Network ◽

Regulatory Networks ◽

Protein Interaction Data ◽

Kinase Inhibition ◽

Protein Protein Interaction ◽

Downstream Effects ◽

Regulate Cell Growth ◽

Regulatory Effects

AbstractThe ability of Mycobacterium tuberculosis (Mtb) to adapt to diverse stresses in its host environment is crucial for pathogenesis. Two essential Mtb serine/threonine protein kinases, PknA and PknB, regulate cell growth in response to environmental stimuli, but little is known about their downstream effects. By combining RNA-Seq data, following treatment with either an inhibitor of both PknA and PknB or an inactive control, with publicly available ChIP-Seq and protein–protein interaction data for transcription factors, we show that the Mtb transcription factor (TF) regulatory network propagates the effects of kinase inhibition and leads to widespread changes in regulatory programs involved in cell wall integrity, stress response, and energy production, among others. We also observe that changes in TF regulatory activity correlate with kinase-specific phosphorylation of those TFs. In addition to characterizing the downstream regulatory effects of PknA/PknB inhibition, this demonstrates the need for regulatory network approaches that can incorporate signal-driven transcription factor modifications.

Download Full-text

Best practices on the differential expression analysis of multi-species RNA-seq

Genome Biology ◽

10.1186/s13059-021-02337-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew Chung ◽

Vincent M. Bruno ◽

David A. Rasko ◽

Christina A. Cuomo ◽

José F. Muñoz ◽

...

Keyword(s):

Best Practices ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Species ◽

Rna Seq ◽

Species Analysis ◽

Differential Gene ◽

Multiple Species ◽

Downstream Analysis

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.

Download Full-text

Identification and Expression Analysis of the Genes Involved in the Raffinose Family Oligosaccharides Pathway of Phaseolus vulgaris and Glycine max

Plants ◽

10.3390/plants10071465 ◽

2021 ◽

Vol 10 (7) ◽

pp. 1465

Author(s):

Ramon de Koning ◽

Raphaël Kiekens ◽

Mary Esther Muyoka Toili ◽

Geert Angenon

Keyword(s):

Common Bean ◽

Seed Development ◽

Expression Analysis ◽

De Novo ◽

Expression Patterns ◽

Gene Families ◽

Rna Seq ◽

Raffinose Family Oligosaccharides ◽

Specific Expression ◽

Raffinose Synthase

Raffinose family oligosaccharides (RFO) play an important role in plants but are also considered to be antinutritional factors. A profound understanding of the galactinol and RFO biosynthetic gene families and the expression patterns of the individual genes is a prerequisite for the sustainable reduction of the RFO content in the seeds, without compromising normal plant development and functioning. In this paper, an overview of the annotation and genetic structure of all galactinol- and RFO biosynthesis genes is given for soybean and common bean. In common bean, three galactinol synthase genes, two raffinose synthase genes and one stachyose synthase gene were identified for the first time. To discover the expression patterns of these genes in different tissues, two expression atlases have been created through re-analysis of publicly available RNA-seq data. De novo expression analysis through an RNA-seq study during seed development of three varieties of common bean gave more insight into the expression patterns of these genes during the seed development. The results of the expression analysis suggest that different classes of galactinol- and RFO synthase genes have tissue-specific expression patterns in soybean and common bean. With the obtained knowledge, important galactinol- and RFO synthase genes that specifically play a key role in the accumulation of RFOs in the seeds are identified. These candidate genes may play a pivotal role in reducing the RFO content in the seeds of important legumes which could improve the nutritional quality of these beans and would solve the discomforts associated with their consumption.

Download Full-text

Loss of Wnt16 Leads to Skeletal Deformities and Downregulation of Bone Developmental Pathway in Zebrafish

International Journal of Molecular Sciences ◽

10.3390/ijms22136673 ◽

2021 ◽

Vol 22 (13) ◽

pp. 6673

Author(s):

Xiaochao Qu ◽

Mei Liao ◽

Weiwei Liu ◽

Yisheng Cai ◽

Qiaorong Yi ◽

...

Keyword(s):

Gene Knockout ◽

Integration Site ◽

Skeletal Development ◽

Rna Seq ◽

Developmental Pathway ◽

Mineral Density ◽

Zebrafish Model ◽

Protein Protein Interaction ◽

Ct Analysis

Wingless-type MMTV integration site family, member 16 (wnt16), is a wnt ligand that participates in the regulation of vertebrate skeletal development. Studies have shown that wnt16 can regulate bone metabolism, but its molecular mechanism remains largely undefined. We obtained the wnt16-/- zebrafish model using the CRISPR-Cas9-mediated gene knockout screen with 11 bp deletion in wnt16, which led to the premature termination of amino acid translation and significantly reduced wnt16 expression, thus obtaining the wnt16-/- zebrafish model. The expression of wnt16 in bone-related parts was detected via in situ hybridization. The head, spine, and tail exhibited significant deformities, and the bone mineral density and trabecular bone decreased in wnt16-/- using light microscopy and micro-CT analysis. RNA sequencing was performed to explore the differentially expressed genes (DEGs). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis found that the down-regulated DEGs are mainly concentrated in mTOR, FoxO, and VEGF pathways. Protein–protein interaction (PPI) network analysis was performed with the detected DEGs. Eight down-regulated DEGs including akt1, bnip4, ptena, vegfaa, twsg1b, prkab1a, prkab1b, and pla2g4f.2 were validated by qRT-PCR and the results were consistent with the RNA-seq data. Overall, our work provides key insights into the influence of wnt16 gene on skeletal development.

Download Full-text