scholarly journals Gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure

2019 ◽  
Author(s):  
Jan Zrimec ◽  
Filip Buric ◽  
Azam Sheikh Muhammad ◽  
Rhongzen Chen ◽  
Vilhelm Verendel ◽  
...  

AbstractUnderstanding the genetic regulatory code that governs gene expression is a primary, yet challenging aspiration in molecular biology that opens up possibilities to cure human diseases and solve biotechnology problems. However, the fundamental question of how each of the individual coding and non-coding regions of the gene regulatory structure interact and contribute to the mRNA expression levels remains unanswered. Considering that all the information for gene expression regulation is already present in living cells, here we applied deep learning on over 20,000 mRNA datasets in 7 model organisms ranging from bacteria to Human. We show that in all organisms, mRNA abundance can be predicted directly from the DNA sequence with high accuracy, demonstrating that up to 82% of the variation of gene expression levels is encoded in the gene regulatory structure. Coding and non-coding regions carry both overlapping and orthogonal information and additively contribute to gene expression levels. By searching for DNA regulatory motifs present across the whole gene regulatory structure, we discover that motif interactions can regulate gene expression levels in a range of over three orders of magnitude. The uncovered co-evolution of coding and non-coding regions challenges the current paradigm that single motifs or regions are solely responsible for gene expression levels. Instead, we show that the correct combination of all regulatory regions must be established in order to accurately control gene expression levels. Therefore, the holistic system that spans the entire gene regulatory structure is required to analyse, understand, and design any future gene expression systems.

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Jan Zrimec ◽  
Christoph S. Börlin ◽  
Filip Buric ◽  
Azam Sheikh Muhammad ◽  
Rhongzen Chen ◽  
...  

AbstractUnderstanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.


2021 ◽  
Author(s):  
Moataz Dowaidar

Changes in gene expression levels above or below a particular threshold may have a dramatic impact on phenotypes, leading to a wide spectrum of human illnesses. Gene-regulatory elements, also known as cis-regulatory elements (CREs), may change the amount, timing, or location (cell/tissue type) of gene expression, whereas mutations in a gene's coding sequence may result in lower or higher gene expression levels resulting in protein loss or gain. Loss-of-function mutations in both genes produce recessive human illness, while haploinsufficient mutations in 65 genes are also known to be deleterious due to function gain, according to the ClinVar1 and ClinGen3 databases. CREs are promoters living near to a gene's transcription start site and switching it on at predefined times, places, and levels. Other distal CREs, like enhancers and silencers, are temporal and tissue-specific control promoters. Enhancers activate promoters, commonly referred to as "promoters," whereas silencers turn them off. Insulators also restrict promiscuous interactions between enhancers and gene promoters. Systematic genomic approaches can help understand the cis-regulatory circuitry of gene expression by highly detecting and functionally defining these CREs. This includes the new use of CRISPR–CRISPR-associated protein 9 (CRISPR–Cas9) and other editing approaches to discover CREs. Cis-Regulation therapy (CRT) provides many promises to heal human ailments. CRT may be used to upregulate or downregulate disease-causing genes due to lower or higher levels of expression, and it may also be used to precisely adjust the expression of genes that assist in alleviating disease features. CRT may employ proteins that generate epigenetic modifications like methylation, histone modification, or gene expression regulation looping. Weighing CRT's advantages and downsides against alternative treatment methods is crucial. CRT platforms might become a practical technique to treat many genetic diseases that now lack treatment alternatives if academics, patient communities, clinicians, regulators and industry work together.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Masataka Kikuchi ◽  
Norikazu Hara ◽  
Mai Hasegawa ◽  
Akinori Miyashita ◽  
Ryozo Kuwano ◽  
...  

Abstract Background Genome-wide association studies (GWASs) have identified single-nucleotide polymorphisms (SNPs) that may be genetic factors underlying Alzheimer’s disease (AD). However, how these AD-associated SNPs (AD SNPs) contribute to the pathogenesis of this disease is poorly understood because most of them are located in non-coding regions, such as introns and intergenic regions. Previous studies reported that some disease-associated SNPs affect regulatory elements including enhancers. We hypothesized that non-coding AD SNPs are located in enhancers and affect gene expression levels via chromatin loops. Methods To characterize AD SNPs within non-coding regions, we extracted 406 AD SNPs with GWAS p-values of less than 1.00 × 10− 6 from the GWAS catalog database. Of these, we selected 392 SNPs within non-coding regions. Next, we checked whether those non-coding AD SNPs were located in enhancers that typically regulate gene expression levels using publicly available data for enhancers that were predicted in 127 human tissues or cell types. We sought expression quantitative trait locus (eQTL) genes affected by non-coding AD SNPs within enhancers because enhancers are regulatory elements that influence the gene expression levels. To elucidate how the non-coding AD SNPs within enhancers affect the gene expression levels, we identified chromatin-chromatin interactions by Hi-C experiments. Results We report the following findings: (1) nearly 30% of non-coding AD SNPs are located in enhancers; (2) eQTL genes affected by non-coding AD SNPs within enhancers are associated with amyloid beta clearance, synaptic transmission, and immune responses; (3) 95% of the AD SNPs located in enhancers co-localize with their eQTL genes in topologically associating domains suggesting that regulation may occur through chromatin higher-order structures; (4) rs1476679 spatially contacts the promoters of eQTL genes via CTCF-CTCF interactions; (5) the effect of other AD SNPs such as rs7364180 is likely to be, at least in part, indirect through regulation of transcription factors that in turn regulate AD associated genes. Conclusion Our results suggest that non-coding AD SNPs may affect the function of enhancers thereby influencing the expression levels of surrounding or distant genes via chromatin loops. This result may explain how some non-coding AD SNPs contribute to AD pathogenesis.


2012 ◽  
Vol 2012 ◽  
pp. 1-9 ◽  
Author(s):  
Stephan P. Persengiev ◽  
Ivanela I. Kondova ◽  
Ronald E. Bontrop

The molecular instructions that govern gene expression regulation are encoded in the genome and ultimately determine the morphology and functional specifications of the human brain. As a consequence, changes in gene expression levels might be directly related to the functional decline associated with brain aging. Small noncoding RNAs, including miRNAs, comprise a group of regulatory molecules that modulate the expression of hundred of genes which play important roles in brain metabolism. Recent comparative studies in humans and nonhuman primates revealed that miRNAs regulate multiple pathways and interconnected signaling cascades that are the basis for the cognitive decline and neurodegenerative disorders during aging. Identifying the roles of miRNAs and their target genes in model organisms combined with system-level studies of the brain would provide more comprehensive understanding of the molecular basis of brain deterioration during the aging process.


2021 ◽  
Vol 22 (S11) ◽  
Author(s):  
Sung-Gwon Lee ◽  
Dokyun Na ◽  
Chungoo Park

Abstract Background Lately, high-throughput RNA sequencing has been extensively used to elucidate the transcriptome landscape and dynamics of cell types of different species. In particular, for most non-model organisms lacking complete reference genomes with high-quality annotation of genetic information, reference-free (RF) de novo transcriptome analyses, rather than reference-based (RB) approaches, are widely used, and RF analyses have substantially contributed toward understanding the mechanisms regulating key biological processes and functions. To date, numerous bioinformatics studies have been conducted for assessing the workflow, production rate, and completeness of transcriptome assemblies within and between RF and RB datasets. However, the degree of consistency and variability of results obtained by analyzing gene expression levels through these two different approaches have not been adequately documented. Results In the present study, we evaluated the differences in expression profiles obtained with RF and RB approaches and revealed that the former tends to be satisfactorily replaced by the latter with respect to transcriptome repertoires, as well as from a gene expression quantification perspective. In addition, we urge cautious interpretation of these findings. Several genes that are lowly expressed, have long coding sequences, or belong to large gene families must be validated carefully, whenever gene expression levels are calculated using the RF method. Conclusions Our empirical results indicate important contributions toward addressing transcriptome-related biological questions in non-model organisms.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 854
Author(s):  
Yishu Wang ◽  
Lingyun Xu ◽  
Dongmei Ai

DNA methylation is an important regulator of gene expression that can influence tumor heterogeneity and shows weak and varying expression levels among different genes. Gastric cancer (GC) is a highly heterogeneous cancer of the digestive system with a high mortality rate worldwide. The heterogeneous subtypes of GC lead to different prognoses. In this study, we explored the relationships between DNA methylation and gene expression levels by introducing a sparse low-rank regression model based on a GC dataset with 375 tumor samples and 32 normal samples from The Cancer Genome Atlas database. Differences in the DNA methylation levels and sites were found to be associated with differences in the expressed genes related to GC development. Overall, 29 methylation-driven genes were found to be related to the GC subtypes, and in the prognostic model, we explored five prognoses related to the methylation sites. Finally, based on a low-rank matrix, seven subgroups were identified with different methylation statuses. These specific classifications based on DNA methylation levels may help to account for heterogeneity and aid in personalized treatments.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Weitong Cui ◽  
Huaru Xue ◽  
Lei Wei ◽  
Jinghua Jin ◽  
Xuewen Tian ◽  
...  

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.


Sign in / Sign up

Export Citation Format

Share Document