scholarly journals New gene association measures by joint network embedding of multiple gene expression datasets

2020 ◽  
Author(s):  
Guiying Wu ◽  
Xiangyu Li ◽  
Wenbo Guo ◽  
Zheng Wei ◽  
Tao Hu ◽  
...  

ABSTRACTLarge number of samples are required to construct a reliable gene co-expression network, the samples from a single gene expression dataset are obviously not enough. However, batch effect may widely exist among datasets due to different experimental conditions. We proposed JEBIN (Joint Embedding of multiple BIpartite Networks) algorithm, it can learn a low-dimensional representation vector for each gene by integrating multiple bipartite networks, and each network corresponds to one dataset. JEBIN owns many inherent advantages, such as it is a nonlinear, global model, has linear time complexity with the number of genes, dataset or samples, and can integrate datasets with different distribution. We verified the effectiveness and scalability of JEBIN through a series of simulation experiments, and proved better performance on real biological data than commonly used integration algorithms. In addition, we conducted a differential co-expression analysis of hepatocellular carcinoma between the single-cell and bulk RNA-seq data, and also a contrast between the hepatocellular carcinoma and its adjacency samples using the bulk RNA-seq data. Analysis results prove that JEBIN can obtain comprehensive and stable gene co-expression networks through integrating multiple datasets and has wide prospect in the functional annotation of unknown genes and the regulatory mechanism inference of target genes.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guiomar Martín ◽  
Yamile Márquez ◽  
Federica Mantica ◽  
Paula Duque ◽  
Manuel Irimia

Abstract Background Alternative splicing (AS) is a widespread regulatory mechanism in multicellular organisms. Numerous transcriptomic and single-gene studies in plants have investigated AS in response to specific conditions, especially environmental stress, unveiling substantial amounts of intron retention that modulate gene expression. However, a comprehensive study contrasting stress-response and tissue-specific AS patterns and directly comparing them with those of animal models is still missing. Results We generate a massive resource for Arabidopsis thaliana, PastDB, comprising AS and gene expression quantifications across tissues, development and environmental conditions, including abiotic and biotic stresses. Harmonized analysis of these datasets reveals that A. thaliana shows high levels of AS, similar to fruitflies, and that, compared to animals, disproportionately uses AS for stress responses. We identify core sets of genes regulated specifically by either AS or transcription upon stresses or among tissues, a regulatory specialization that is tightly mirrored by the genomic features of these genes. Unexpectedly, non-intron retention events, including exon skipping, are overrepresented across regulated AS sets in A. thaliana, being also largely involved in modulating gene expression through NMD and uORF inclusion. Conclusions Non-intron retention events have likely been functionally underrated in plants. AS constitutes a distinct regulatory layer controlling gene expression upon internal and external stimuli whose target genes and master regulators are hardwired at the genomic level to specifically undergo post-transcriptional regulation. Given the higher relevance of AS in the response to different stresses when compared to animals, this molecular hardwiring is likely required for a proper environmental response in A. thaliana.


mSystems ◽  
2020 ◽  
Vol 5 (6) ◽  
Author(s):  
Kumari Sonal Choudhary ◽  
Julia A. Kleinmanns ◽  
Katherine Decker ◽  
Anand V. Sastry ◽  
Ye Gao ◽  
...  

ABSTRACT Escherichia coli uses two-component systems (TCSs) to respond to environmental signals. TCSs affect gene expression and are parts of E. coli’s global transcriptional regulatory network (TRN). Here, we identified the regulons of five TCSs in E. coli MG1655: BaeSR and CpxAR, which were stimulated by ethanol stress; KdpDE and PhoRB, induced by limiting potassium and phosphate, respectively; and ZraSR, stimulated by zinc. We analyzed RNA-seq data using independent component analysis (ICA). ChIP-exo data were used to validate condition-specific target gene binding sites. Based on these data, we do the following: (i) identify the target genes for each TCS; (ii) show how the target genes are transcribed in response to stimulus; and (iii) reveal novel relationships between TCSs, which indicate noncognate inducers for various response regulators, such as BaeR to iron starvation, CpxR to phosphate limitation, and PhoB and ZraR to cell envelope stress. Our understanding of the TRN in E. coli is thus notably expanded. IMPORTANCE E. coli is a common commensal microbe found in the human gut microenvironment; however, some strains cause diseases like diarrhea, urinary tract infections, and meningitis. E. coli’s two-component systems (TCSs) modulate target gene expression, especially related to virulence, pathogenesis, and antimicrobial peptides, in response to environmental stimuli. Thus, it is of utmost importance to understand the transcriptional regulation of TCSs to infer bacterial environmental adaptation and disease pathogenicity. Utilizing a combinatorial approach integrating RNA sequencing (RNA-seq), independent component analysis, chromatin immunoprecipitation coupled with exonuclease treatment (ChIP-exo), and data mining, we suggest five different modes of TCS transcriptional regulation. Our data further highlight noncognate inducers of TCSs, which emphasizes the cross-regulatory nature of TCSs in E. coli and suggests that TCSs may have a role beyond their cognate functionalities. In summary, these results can lead to an understanding of the metabolic capabilities of bacteria and correctly predict complex phenotype under diverse conditions, especially when further incorporated with genome-scale metabolic models.


Molecules ◽  
2018 ◽  
Vol 23 (9) ◽  
pp. 2331 ◽  
Author(s):  
Qianqian Zhang ◽  
Wei Liu ◽  
Yingli Cai ◽  
A-Feng Lan ◽  
Yinbing Bian

The reliability of qRT-PCR results depend on the stability of reference genes used for normalization, suggesting the necessity of identification of reference genes before gene expression analysis. Morels are edible mushrooms well-known across the world and highly prized by many culinary kitchens. Here, several candidate genes were selected and designed according to the Morchella importuna transcriptome data. The stability of the candidate genes was evaluated with geNorm and NormFinder under three different experimental conditions, and several genes with excellent stability were selected. The extensive adaptability of the selected genes was tested in ten Morchella species. Results from the three experimental conditions revealed that ACT1 and INTF7 were the most prominent genes in Morchella, CYC3 was the most stable gene in different development stages, INTF4/AEF3 were the top-ranked genes across carbon sources, while INTF3/CYC3 pair showed the robust stability for temperature stress treatment. We suggest using ACT1, AEF3, CYC3, INTF3, INTF4 and INTF7 as reference genes for gene expression analysis studies for any of the 10 Morchella strains tested in this study. The stability and practicality of the gene, vacuolar protein sorting (INTF3), vacuolar ATP synthase (INTF4) and14-3-3 protein (INTF7) involving the basic biological processes were validated for the first time as the candidate reference genes for quantitative PCR. Furthermore, the stability of the reference genes was found to vary under the three different experimental conditions, indicating the importance of identifying specific reference genes for particular conditions.


2017 ◽  
Vol 29 (1) ◽  
pp. 173
Author(s):  
Z. Jiang ◽  
J. Sun ◽  
S. Marjani ◽  
H. Dong ◽  
X. Zheng ◽  
...  

Appropriate reference genes for accurate normalization in RT-PCR are essential for the study of gene expression. Ideal reference genes should not only have stable expression across stages of embryo development, but also be expressed at comparable levels to the target genes. Using RNA-seq data from in vivo-produced bovine oocytes and embryos from the 2-cell to blastocyst stage (Jiang et al., 2014 BMC Genomics 15, 756), we tried to establish a catalogue of all reference genes for RT-PCR analysis. One-way ANOVA generated 4055 genes that did not differ across stages. To reduce this list, we used the entire RNA-seq data set and first removed genes with a FPKM (fragments per kilobase of transcript per million mapped reads) of <1, and then rescaled each gene’s expression values within a range of 0 to 1. We subsequently calculated the expression variance for each gene across all stages. By assuming that the calculated variances follow a Gaussian distribution and that the majority of the genes do not have a stable expression level, a gene was classified as a reference if its variance significantly deviated (P < 0.05) from these assumptions. We identified 346 potential reference genes, all of which were among the candidates from the ANOVA analysis. We arbitrarily assigned genes in this list to high (FPKM ≥ 100), medium (10 < FPKM < 100), and low expression levels (FPKM ≤ 10), and 37, 154, and 155 genes, respectively, fell into these groups. Surprisingly, none of the commonly used reference genes, such as GAPDH, PPIA, ACTB, PRL15, GUSB, and H3F2A, were identified as being stably expressed across in vivo development. This is consistent with findings of prior RT-PCR studies (Robert et al. 2002 Biol. Reprod. 67, 1465–1472; Ross et al. 2010 Cell Reprogram. 12, 709–717). The following gene ontology terms were significantly enriched for the 346 genes: cell cycle, translation, transport, chromatin, cell division, and metabolic process, indicating that the early embryos maintained constant levels of genes involved in fundamental biological functions. Finally, we performed RT-PCR to validate the RNA-seq results using different bovine in vivo-derived oocytes and embryos (n = 3/stage). We successfully validated 10 selected genes, including those in the high (CS, PGD, and ACTR3), medium (CCT5, MRPL47, COG2, CRT9, and HELLS), and low expression groups (CDC23 and TTF1). In conclusion, we recommend the use of reference genes that are expressed at comparable levels to target genes. This study offers a useful resource to aid in the appropriate selection of reference genes, which will improve the accuracy of quantitative gene expression analyses across bovine embryo pre-implantation development.


2020 ◽  
Author(s):  
Ryo Miyamoto ◽  
Akinori Kanai ◽  
Hiroshi Okuda ◽  
Satoshi Takahashi ◽  
Hirotaka Matsui ◽  
...  

AbstractHOXA9 is often highly expressed in leukemias. However, its precise roles in leukemogenesis remain elusive. Here, we show that HOXA9 maintains gene expression for multiple anti-apoptotic pathways to promote leukemogenesis. In MLL-rearranged leukemia, MLL fusion directly activates the expression of MYC and HOXA9. Combined expression of MYC and HOXA9 induced leukemia, whereas single gene transduction of either did not, indicating a synergy between MYC and HOXA9. HOXA9 sustained expression of the genes implicated to the hematopoietic precursor identity when expressed in hematopoietic precursors, but did not reactivate it once silenced. Among the HOXA9 target genes, BCL2 and SOX4 synergistically induced leukemia with MYC. Not only BCL2, but also SOX4 suppressed apoptosis, indicating that multiple anti-apoptotic pathways underlie cooperative leukemogenesis by HOXA9 and MYC. These results demonstrate that HOXA9 is a key transcriptional maintenance factor which promotes MYC-mediated leukemogenesis, potentially explaining why HOXA9 is highly expressed in many leukemias.


2020 ◽  
Author(s):  
Guiomar Martín ◽  
Yamile Márquez ◽  
Federica Mantica ◽  
Paula Duque ◽  
Manuel Irimia

AbstractBackgroundAlternative splicing (AS) is a widespread regulatory mechanism in multicellular organisms. Numerous transcriptomic and single-gene studies in plants have investigated AS in response to specific conditions, especially environmental stress, unveiling substantial amounts of intron retention that modulate gene expression. However, a comprehensive study contrasting stress-response and tissue-specific AS patterns and directly comparing them with those of animal models is still missing.ResultsWe generated a massive resource for A. thaliana (PastDB; pastdb.crg.eu), comprising AS and gene expression quantifications across tissues, development and environmental conditions, including abiotic and biotic stresses. Harmonized analysis of these datasets revealed that A. thaliana shows high levels of AS (similar to fruitflies) and that, compared to animals, disproportionately uses AS for stress responses. We identified core sets of genes regulated specifically by either AS or transcription upon stresses or among tissues, a regulatory specialization that was tightly mirrored by the genomic features of these genes. Unexpectedly, non-intron retention events, including exon skipping, were overrepresented across regulated AS sets in A. thaliana, being also largely involved in modulating gene expression through NMD and uORF inclusion.ConclusionsNon-intron retention events have likely been functionally underrated in plants. AS constitutes a distinct regulatory layer controlling gene expression upon internal and external stimuli whose target genes and master regulators are hardwired at the genomic level to specifically undergo post-transcriptional regulation. Given the higher relevance of AS in the response to different stresses when compared to animals, this molecular hardwiring is likely required for a proper environmental response in A. thaliana.


2020 ◽  
Author(s):  
Rwik Sen ◽  
Ezra Lencer ◽  
Elizabeth A. Geiger ◽  
Kenneth L. Jones ◽  
Tamim H. Shaikh ◽  
...  

AbstractCongenital Heart Defects (CHDs) are the most common form of birth defects, observed in 4-10/1000 live births. CHDs result in a wide range of structural and functional abnormalities of the heart which significantly affect quality of life and mortality. CHDs are often seen in patients with mutations in epigenetic regulators of gene expression, like the genes implicated in Kabuki syndrome – KMT2D and KDM6A, which play important roles in normal heart development and function. Here, we examined the role of two epigenetic histone modifying enzymes, KMT2D and KDM6A, in the expression of genes associated with early heart and neural crest cell (NCC) development. Using CRISPR/Cas9 mediated mutagenesis of kmt2d, kdm6a and kdm6al in zebrafish, we show cardiac and NCC gene expression is reduced, which correspond to affected cardiac morphology and reduced heart rates. To translate our results to a human pathophysiological context and compare transcriptomic targets of KMT2D and KDM6A across species, we performed RNA sequencing (seq) of lymphoblastoid cells from Kabuki Syndrome patients carrying mutations in KMT2D and KDM6A. We compared the human RNA-seq datasets with RNA-seq datasets obtained from mouse and zebrafish. Our comparative interspecies analysis revealed common targets of KMT2D and KDM6A, which are shared between species, and these target genes are reduced in expression in the zebrafish mutants. Taken together, our results show that KMT2D and KDM6A regulate common and unique genes across humans, mice, and zebrafish for early cardiac and overall development that can contribute to the understanding of epigenetic dysregulation in CHDs.


2022 ◽  
Vol 12 ◽  
Author(s):  
Ningning Fu ◽  
Jiaxing Li ◽  
Ming Wang ◽  
Lili Ren ◽  
Shixiang Zong ◽  
...  

A strict relationship exists between the Sirex noctilio and the Amylostereum areolatum, which is carried and spread by its partner. The growth and development of this symbiotic fungus is key to complete the life history of the Sirex woodwasp. Real-time quantitative polymerase chain reaction (RT-qPCR) is used to measure gene expression in samples of A. areolatum at different growth stages and explore the key genes and pathways involved in the growth and development of this symbiotic fungus. To obtain accurate RT-qPCR data, target genes need to be normalized by reference genes that are stably expressed under specific experimental conditions. In our study, the stability of 10 candidate reference genes in symbiotic fungal samples at different growth and development stages was evaluated using geNorm, NormFinder, BestKeeper, delta Ct methods, and RefFinder. Meanwhile, laccase1 was used to validate the stability of the selected reference gene. Under the experimental conditions of this study, p450, CYP, and γ-TUB were identified as suitable reference genes. This work is the first to systematically evaluate the reference genes for RT-qPCR results normalization during the growth of this symbiotic fungus, which lays a foundation for further gene expression experiments and understanding the symbiotic relationship and mechanism between S. noctilio and A. areolatum.


2020 ◽  
Author(s):  
Haiying Geng ◽  
Meng Wang ◽  
Jiazhen Gong ◽  
Yupu Xu ◽  
Shisong Ma

ABSTRACTGene expression regulation by transcription factors (TF) has long been studied, but no model exists yet that can accurately predict transcriptome profiles based on TF activities. We have constructed a universal predictor for Arabidopsis to predict the expression of 28192 non-TF genes using 1678 TFs. Applied to bulk RNA-Seq samples from diverse tissues, the predictor produced accurate predicted transcriptomes correlating well with actual expression, with average correlation coefficient of 0.986. Having recapitulated the quantitative relationships between TFs and target genes, the predictor further enabled downstream inference of TF regulators for genes and pathways, i.e. those involved in suberin, flavonoid, glucosinolate metabolism, lateral root, xylem, secondary cell wall development, and endoplasmic reticulum stress response. Our predictor provides an innovative approach to study transcriptional regulation.


2017 ◽  
Author(s):  
Nisar Wani ◽  
Khalid Raza

AbstractGene expression patterns determine the manner whereby organisms regulate various cellular processes and therefore their organ functions.These patterns do not emerge on their own, but as a result of diverse regulatory factors such as, DNA binding proteins known as transcription factors (TF), chromatin structure and various other environmental factors. TFs play a pivotal role in gene regulation by binding to different locations on the genome and influencing the expression of their target genes. Therefore, predicting target genes and their regulation becomes an important task for understanding mechanisms that control cellular processes governing both healthy and diseased cells.In this paper, we propose an integrated inference pipeline for predicting target genes and their regulatory effects for a specific TF using next-generation data analysis tools.


Sign in / Sign up

Export Citation Format

Share Document