scholarly journals Big-data analysis unearths the general regulatory regime in normal human genome and cancer

2019 ◽  
Author(s):  
Anyou Wang ◽  
Hai Rong

Gene regulation interprets most variations of biological phenotype and remains a crucial topic in biology. Conventionally, manipulating gene sequences like knockout helps to infer gene regulation, but these inferences suffer several pitfalls like transcript compensation1, leading to biased results. An unbiased regulation has rarely been appreciated. Here, we develop a software, FINET2, to infer unbiased regulatory networks from massive data, including all human RNAseq data publicly available from Sequence Read Archive (SRA, 274469 samples) and The Cancer Genome Atlas (TCGA, 11574 samples), and unearth the general regulatory rules in normal genome and cancer as deposited3. Generally, the genome is positively regulated. Regulators primarily self-regulate their targets in the same annotated category, like processed-pseudogenes regulating processed-pseudogenes. At normal, ribosomal proteins drive the regulatory network, and proteins tightly control the genome and primarily regulate the remote proteins across chromosomes, but rarely regulate local targets (<1M bp), yet cancer noncoding RNAs, especially pseudogenes, strongly activate the cancer genome and induce local targets, including noncoding RNAs and proteins. As a result, the whole regulatory regime switches from a normal remote protein-controlled domain to a cancerous local noncoding RNA-activated niche. This parallels with our recent discovery from clinical data revealing noncoding RNAs as the deadliest drivers for cancer4, instead of proteins as conventionally thought. This refreshes the fundamental basis of cancer research and therapy. Our overall finding provides a systems version of the natural regulatory regime in human genome, which helps to correct the biased notions standing in current literature.

Epigenomics ◽  
2020 ◽  
Vol 12 (21) ◽  
pp. 1929-1947
Author(s):  
Wei Xiong ◽  
Mengran Yao ◽  
Yuqiao Yang ◽  
Yan Qu ◽  
Jinqiao Qian

Diabetic cardiovascular diseases (DCVDs) are the most common complications of diabetes mellitus and are considered to be one of the most important threats to global health and an economic burden. Long noncoding RNA (lncRNA), circular RNA (circRNA), and miRNA are a novel group of noncoding RNAs that are involved in the regulation of various pathophysiological processes, including DCVDs. Interestingly, both lncRNA and circRNA can act as competing endogenous RNA of miRNA, thereby regulating the expression of the target mRNA by decoying or sponging the miRNA. In this review, we focus on the mechanistic, pathological and functional roles of lncRNA/circRNA-miRNA-mRNA networks in DCVDs and further discuss the potential implications for early detection, therapeutic intervention and prognostic evaluation.


2015 ◽  
Vol 14s1 ◽  
pp. CIN.S24657
Author(s):  
Wan-Ping Lee ◽  
Jiantao Wu ◽  
Gabor T. Marth

Mobile elements constitute greater than 45% of the human genome as a result of repeated insertion events during human genome evolution. Although most of mobile elements are fixed within the human population, some elements (including ALU, long interspersed elements (LINE) 1 (L1), and SVA) are still actively duplicating and may result in life-threatening human diseases such as cancer, motivating the need for accurate mobile-element insertion (MEI) detection tools. We developed a software package, TANGRAM, for MEI detection in next-generation sequencing data, currently serving as the primary MEI detection tool in the 1000 Genomes Project. TANGRAM takes advantage of valuable mapping information provided by our own MOSAIK mapper, and until recently required MOSAIK mappings as its input. In this study, we report a new feature that enables TANGRAM to be used on alignments generated by any mainstream short-read mapper, making it accessible for many genomic users. To demonstrate its utility for cancer genome analysis, we have applied TANGRAM to the TCGA (The Cancer Genome Atlas) mutation calling benchmark 4 dataset. TANGRAM is fast, accurate, easy to use, and open source on https://github.com/jiantao/Tangram .


2018 ◽  
Author(s):  
Seo Jeong Shin ◽  
Seng Chan You ◽  
Yu Rang Park ◽  
Jin Roh ◽  
Jang-Hee Kim ◽  
...  

BACKGROUND Clinical sequencing data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient care. A distributed research network allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) used in distributed research networks has low coverage of sequencing data and does not reflect the latest trends of precision medicine. OBJECTIVE The aim of this study was to develop and evaluate the feasibility of a genomic CDM (G-CDM), as an extension of the OMOP-CDM, for application of genomic data in clinical practice. METHODS Existing genomic data models and sequencing reports were reviewed to extend the OMOP-CDM to cover genomic data. The Human Genome Organisation Gene Nomenclature Committee and Human Genome Variation Society nomenclature were adopted to standardize the terminology in the model. Sequencing data of 114 and 1060 patients with lung cancer were obtained from the Ajou University School of Medicine database of Ajou University Hospital and The Cancer Genome Atlas, respectively, which were transformed to a format appropriate for the G-CDM. The data were compared with respect to gene name, variant type, and actionable mutations. RESULTS The G-CDM was extended into four tables linked to tables of the OMOP-CDM. Upon comparison with The Cancer Genome Atlas data, a clinically actionable mutation, p.Leu858Arg, in the EGFR gene was 6.64 times more frequent in the Ajou University School of Medicine database, while the p.Gly12Xaa mutation in the KRAS gene was 2.02 times more frequent in The Cancer Genome Atlas dataset. The data-exploring tool GeneProfiler was further developed to conduct descriptive analyses automatically using the G-CDM, which provides the proportions of genes, variant types, and actionable mutations. GeneProfiler also allows for querying the specific gene name and Human Genome Variation Society nomenclature to calculate the proportion of patients with a given mutation. CONCLUSIONS We developed the G-CDM for effective integration of genomic data with standardized clinical data, allowing for data sharing across institutes. The feasibility of the G-CDM was validated by assessing the differences in data characteristics between two different genomic databases through the proposed data-exploring tool GeneProfiler. The G-CDM may facilitate analyses of interoperating clinical and genomic datasets across multiple institutions, minimizing privacy issues and enabling researchers to better understand the characteristics of patients and promote personalized medicine in clinical practice.


2018 ◽  
Vol 72 ◽  
pp. 991-996
Author(s):  
Marzena Anna Lewandowska ◽  
Łukasz Żołna ◽  
Krzysztof Roszkowski ◽  
Janusz Kowalewski

Fifteen years after the publication of the full sequence of the human genome which revolutionized medicine and biotechnology, profound elucidation of the molecular mechanisms of genetic disorders remains a challenge. National and international institutions conduct a number of research projects in genomics. Some of them are focused on the characterization of functional elements of the genome (e.g., the Genome Browser database by the ENCODE consortium), some gather information on polymorphisms (HapMap, The 1000 Genomes Project) and mutations (The Human Gene Mutation Database), while other are specifically dedicated to the genomic characterization of cancer (The Cancer Genome Atlas, The Pediatric Cancer Genome Project). Even though the projects are conducted independently, juxtapositions of the constantly updated project data may be performed, leading to interesting results. The genome-wide association studies (GWAS) allowed the identification of millions of SNPs and short insertions/deletions, as well as thousands of structural variants of polymorphic gene products. Further data-mining studies allowed the distinction between synonymous and nonsynonymous SNPs, which became the basis for the epidemiological studies of various types of genetic disorders. The results of the sequencing of entire genomes and transcriptomes may be useful in the identification of novel prognostic and predictive markers. High-throughput technologies are emerging methods in molecular diagnostics, furthermore the correlation of DNA methylation patterns and gene expression profiles may also provide useful results in cancer diagnostics.


2014 ◽  
Vol 13s4 ◽  
pp. CIN.S13979 ◽  
Author(s):  
Wan-Ping Lee ◽  
Jiantao Wu ◽  
Gabor T. Marth

Mobile elements constitute greater than 45% of the human genome as a result of repeated insertion events during human genome evolution. Although most of mobile elements are fixed within the human population, some elements (including ALU, long interspersed elements (LINE) 1 (L1), and SVA) are still actively duplicating and may result in life-threatening human diseases such as cancer, motivating the need for accurate mobile-element insertion (MEI) detection tools. We developed a software package, TANGRAM, for MEI detection in next-generation sequencing data, currently serving as the primary MEI detection tool in the 1000 Genomes Project. TANGRAM takes advantage of valuable mapping information provided by our own MOSAIK mapper, and until recently required MOSAIK mappings as its input. In this study, we report a new feature that enables TANGRAM to be used on alignments generated by any mainstream short-read mapper, making it accessible for many genomic users. To demonstrate its utility for cancer genome analysis, we have applied TANGRAM to the TCGA (The Cancer Genome Atlas) mutation calling benchmark 4 dataset. TANGRAM is fast, accurate, easy to use, and open source on https://github.com/jiantao/Tangram .


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Yunyun Lan ◽  
Juan Su ◽  
Yaxin Xue ◽  
Lulu Zeng ◽  
Xun Cheng ◽  
...  

Background. Breast cancer (BRCA) is one of the most common cancers and the leading cause of cancer-related death in women. RNA-binding proteins (RBPs) play an important role in the emergence and pathogenesis of tumors. The target RNAs of RBPs are very diverse; in addition to binding to mRNA, RBPs also bind to noncoding RNA. Noncoding RNA can cause secondary structures that can bind to RBPs and regulate multiple processes such as splicing, RNA modification, protein localization, and chromosomes remodeling, which can lead to tumor initiation, progression, and invasion. Methods. (1) BRCA data were downloaded from The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) databases and were used as training and testing datasets, respectively. (2) The prognostic RBPs-related genes were screened according to the overlapping differentially expressed genes (DEGs) from the TCGA database. (3) Univariate Cox proportional hazard regression was performed to identify the genes with significant prognostic value. (4) Further, we used the LASSO regression to construct a prognostic signature and validated the signature in the TCGA and ICGC cohort. (5) Besides, we also performed prognostic analysis, expression level verification, immune cell correlation analysis, and drug correlation analysis of the genes in the model. Results. Four genes (MRPL13, IGF2BP1, BRCA1, and MAEL) were identified as prognostic gene signatures. The prognostic model has been validated in the TCGA and ICGC cohorts. The risk score calculated with four genes signatures could largely predict overall survival for 1, 3, and 5 years in patients with BRCA. The calibration plot demonstrated outstanding consistency between the prediction and actual observation. The findings of online database verification revealed that these four genes were significantly highly expressed in tumors. Also, we observed their significant correlations with some immune cells and also potential correlations with some drugs. Conclusion. We constructed a 4-RBPs-based prognostic signature to predict the prognosis of BRCA patients, and it has the potential for treating and diagnosing BRCA.


2020 ◽  
Vol 22 (1) ◽  
Author(s):  
Hao-Yu Guo ◽  
Ming-Ke Guo ◽  
Zhong-Yuan Wan ◽  
Fang Song ◽  
Hai-Qiang Wang

AbstractIntervertebral disc degeneration (IDD) is the most common cause of low-back pain. Accumulating evidence indicates that the expression profiling of noncoding RNAs (ncRNAs), including microRNAs (miRNAs), circular RNAs (circRNAs), and long noncoding RNAs (lncRNAs), are different between intervertebral disc tissues obtained from healthy individuals and patients with IDD. However, the roles of ncRNAs in IDD are still unclear until now. In this review, we summarize the studies concerning ncRNA interactions and regulatory functions in IDD. Apoptosis, aberrant proliferation, extracellular matrix degradation, and inflammatory abnormality are tetrad fundamental pathologic phenotypes in IDD. We demonstrated that ncRNAs are playing vital roles in apoptosis, proliferation, ECM degeneration, and inflammation process of IDD. The ncRNAs participate in underlying mechanisms of IDD in different ways. MiRNAs downregulate target genes’ expression by directly binding to the 3′-untranslated region of mRNAs. CircRNAs and lncRNAs act as sponges or competing endogenous RNAs by competitively binding to miRNAs and regulating the expression of mRNAs. The lncRNAs, circRNAs, miRNAs, and mRNAs widely crosstalk and form complex regulatory networks in the degenerative processes. The current review presents novel insights into the pathogenesis of IDD and potentially sheds light on the therapeutics in the future.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1489-D1495 ◽  
Author(s):  
Jingjing Jin ◽  
Peng Lu ◽  
Yalong Xu ◽  
Zefeng Li ◽  
Shizhou Yu ◽  
...  

Abstract Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides with little or no protein coding potential. The expanding list of lncRNAs and accumulating evidence of their functions in plants have necessitated the creation of a comprehensive database for lncRNA research. However, currently available plant lncRNA databases have some deficiencies, including the lack of lncRNA data from some model plants, uneven annotation standards, a lack of visualization for expression patterns, and the absence of epigenetic information. To overcome these problems, we upgraded our Plant Long noncoding RNA Database (PLncDB, http://plncdb.tobaccodb.org/), which was based on a uniform annotation pipeline. PLncDB V2.0 currently contains 1 246 372 lncRNAs for 80 plant species based on 13 834 RNA-Seq datasets, integrating lncRNA information from four other resources including EVLncRNAs, RNAcentral and etc. Expression patterns and epigenetic signals can be visualized using multiple tools (JBrowse, eFP Browser and EPexplorer). Targets and regulatory networks for lncRNAs are also provided for function exploration. In addition, PLncDB V2.0 is hierarchical and user-friendly and has five built-in search engines. We believe PLncDB V2.0 is useful for the plant lncRNA community and data mining studies and provides a comprehensive resource for data-driven lncRNA research in plants.


2020 ◽  
Vol 11 ◽  
Author(s):  
Renliang Sun ◽  
Yizhou Xu ◽  
Hang Zhang ◽  
Qiangzhen Yang ◽  
Ke Wang ◽  
...  

Hepatocellular carcinoma (HCC) is the predominant form of liver cancer and has long been among the top three cancers that cause the most deaths worldwide. Therapeutic options for HCC are limited due to the pronounced tumor heterogeneity. Thus, there is a critical need to study HCC from a systems point of view to discover effective therapeutic targets, such as through the systematic study of disease perturbation in both regulation and metabolism using a unified model. Such integration makes sense for cancers as it links one of the dominant physiological features of cancers (growth, which is driven by metabolic networks) with the primary available omics data source, transcriptomics (which is systematically integrated with metabolism through the regulatory-metabolic network model). Here, we developed an integrated transcriptional regulatory-metabolic model for HCC molecular stratification and the prediction of potential therapeutic targets. To predict transcription factors (TFs) and target genes affecting tumorigenesis, we used two algorithms to reconstruct the genome-scale transcriptional regulatory networks for HCC and normal liver tissue. which were then integrated with corresponding constraint-based metabolic models. Five key TFs affecting cancer cell growth were identified. They included the regulator CREB3L3, which has been associated with poor prognosis. Comprehensive personalized metabolic analysis based on models generated from data of liver HCC in The Cancer Genome Atlas revealed 18 genes essential for tumorigenesis in all three subtypes of patients stratified based on the non-negative matrix factorization method and two other genes (ACADSB and CMPK1) that have been strongly correlated with lower overall survival subtype. Among these 20 genes, 11 are targeted by approved drugs for cancers or cancer-related diseases, and six other genes have corresponding drugs being evaluated experimentally or investigationally. The remaining three genes represent potential targets. We also validated the stratification and prognosis results by an independent dataset of HCC cohort samples (LIRI-JP) from the International Cancer Genome Consortium database. In addition, microRNAs targeting key TFs and genes were also involved in established cancer-related pathways. Taken together, the multi-scale regulatory-metabolic model provided a new approach to assess key mechanisms of HCC cell proliferation in the context of systems and suggested potential targets.


Epigenomics ◽  
2020 ◽  
Vol 12 (15) ◽  
pp. 1303-1315
Author(s):  
Weibo Du ◽  
Wenbiao Chen ◽  
Zheyue Shu ◽  
Dairong Xiang ◽  
Kefan Bi ◽  
...  

Aim: This study aimed to identify long noncoding RNAs (lncRNAs) with potential to be prognostic biomarkers of hepatocellular carcinoma (HCC) by analyzing copy number alterations (CNAs). Methods: RNA Sequencing data of 369 HCC patients was downloaded from The Cancer Genome Atlas database and analyzed with a series of systematic bioinformatics methods. Results: LncRNA-CNA association analysis revealed that many lncRNAs were located in sites frequently amplified or deleted. Three upregulated lncRNAs (LINC00689, SNHG20 and MAFG-AS1) with copy amplification and one downregulated lncRNA TMEM220-AS1 with copy deletion were associated with poor prognosis of HCC. Conclusion: This study reveals that differentially expressed lncRNAs correlate with CNAs in HCC. Moreover, the differentially expressed lncRNAs and their copy amplification/deletions could be promising prognostic biomarkers of HCC.


Sign in / Sign up

Export Citation Format

Share Document