scholarly journals CancerMIRNome: an interactive analysis and visualization database for miRNome profiles of human cancer

2021 ◽  
Author(s):  
Ruidong Li ◽  
Han Qu ◽  
Shibo Wang ◽  
John M Chater ◽  
Xuesong Wang ◽  
...  

Abstract MicroRNAs (miRNAs), which play critical roles in gene regulatory networks, have emerged as promising diagnostic and prognostic biomarkers for human cancer. In particular, circulating miRNAs that are secreted into circulation exist in remarkably stable forms, and have enormous potential to be leveraged as non-invasive biomarkers for early cancer detection. Novel and user-friendly tools are desperately needed to facilitate data mining of the vast amount of miRNA expression data from The Cancer Genome Atlas (TCGA) and large-scale circulating miRNA profiling studies. To fill this void, we developed CancerMIRNome, a comprehensive database for the interactive analysis and visualization of miRNA expression profiles based on 10 554 samples from 33 TCGA projects and 28 633 samples from 40 public circulating miRNome datasets. A series of cutting-edge bioinformatics tools and machine learning algorithms have been packaged in CancerMIRNome, allowing for the pan-cancer analysis of a miRNA of interest across multiple cancer types and the comprehensive analysis of miRNome profiles to identify dysregulated miRNAs and develop diagnostic or prognostic signatures. The data analysis and visualization modules will greatly facilitate the exploit of the valuable resources and promote translational application of miRNA biomarkers in cancer. The CancerMIRNome database is publicly available at http://bioinfo.jialab-ucr.org/CancerMIRNome.

2020 ◽  
Author(s):  
Ruidong Li ◽  
Han Qu ◽  
Shibo Wang ◽  
Xuesong Wang ◽  
Yanru Cui ◽  
...  

ABSTRACTMicroRNAs (miRNAs), which play critical roles in gene regulatory networks, have emerged as promising biomarkers for a variety of human diseases, including cancer. In particular, circulating miRNAs that are secreted into circulation exist in remarkably stable forms, and have enormous potential to be leveraged as non-invasive diagnostic biomarkers for early cancer detection. The vast amount of miRNA expression data from tens of thousands of samples in various types of cancers generated by The Cancer Genome Atlas (TCGA) and circulating miRNA data produced by many large-scale circulating miRNA profiling studies provide extraordinary opportunities for the discovery and validation of miRNA signatures in cancer. Novel and user-friendly tools are desperately needed to facilitate the data mining of such valuable cancer miRNome datasets. To fill this void, we developed CancerMIRNome, a web server for interactive analysis and visualization of cancer miRNome data based on TCGA and public circulating miRNome datasets. A series of cutting-edge bioinformatics tools and functions have been packaged in CancerMIRNome, allowing for a pan-cancer analysis of a miRNA of interest across multiple cancer types and a comprehensive analysis of cancer miRNome at the dataset level. The CancerMIRNome web server is freely available at http://bioinfo.jialab-ucr.org/CancerMIRNome.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yanan Ren ◽  
Ting-You Wang ◽  
Leah C. Anderton ◽  
Qi Cao ◽  
Rendong Yang

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.


2017 ◽  
pp. 1-15 ◽  
Author(s):  
Russell Bonneville ◽  
Melanie A. Krook ◽  
Esko A. Kautto ◽  
Jharna Miya ◽  
Michele R. Wing ◽  
...  

Purpose Microsatellite instability (MSI) is a pattern of hypermutation that occurs at genomic microsatellites and is caused by defects in the mismatch repair system. Mismatch repair deficiency that leads to MSI has been well described in several types of human cancer, most frequently in colorectal, endometrial, and gastric adenocarcinomas. MSI is known to be both predictive and prognostic, especially in colorectal cancer; however, current clinical guidelines only recommend MSI testing for colorectal and endometrial cancers. Therefore, less is known about the prevalence and extent of MSI among other types of cancer. Methods Using our recently published MSI-calling software, MANTIS, we analyzed whole-exome data from 11,139 tumor-normal pairs from The Cancer Genome Atlas and Therapeutically Applicable Research to Generate Effective Treatments projects and external data sources across 39 cancer types. Within a subset of these cancer types, we assessed mutation burden, mutational signatures, and somatic variants associated with MSI. Results We identified MSI in 3.8% of all cancers assessed—present in 27 of tumor types—most notably adrenocortical carcinoma (ACC), cervical cancer (CESC), and mesothelioma, in which MSI has not yet been well described. In addition, MSI-high ACC and CESC tumors were observed to have a higher average mutational burden than microsatellite-stable ACC and CESC tumors. Conclusion We provide evidence of as-yet-unappreciated MSI in several types of cancer. These findings support an expanded role for clinical MSI testing across multiple cancer types as patients with MSI-positive tumors are predicted to benefit from novel immunotherapies in clinical trials.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249424
Author(s):  
Stepan Nersisyan ◽  
Alexei Galatenko ◽  
Vladimir Galatenko ◽  
Maxim Shkurnikov ◽  
Alexander Tonevitsky

Analysis of regulatory networks is a powerful framework for identification and quantification of intracellular interactions. We introduce miRGTF-net, a novel tool for construction of miRNA-gene-TF networks. We consider multiple transcriptional and post-transcriptional interaction types, including regulation of gene and miRNA expression by transcription factors, gene silencing by miRNAs, and co-expression of host genes with their intronic miRNAs. The underlying algorithm uses information on experimentally validated interactions as well as integrative miRNA/mRNA expression profiles in a given set of samples. The latter ensures simultaneous tissue-specificity and biological validity of interactions. We applied miRGTF-net to paired miRNA/mRNA-sequencing data of breast cancer samples from The Cancer Genome Atlas (TCGA). Together with topological analysis of the constructed network we showed that considered players can form reliable prognostic gene signatures for ER-positive breast cancer. A number of signatures demonstrated remarkably high accuracy on transcriptomic data obtained by both microarrays and RNA sequencing from several independent patient cohorts. Furthermore, an essential part of prognostic genes were identified as direct targets of transcription factor E2F1. The putative interplay between estrogen receptor alpha and E2F1 was suggested as a potential recurrence factor in patients treated with tamoxifen. Source codes of miRGTF-net are available at GitHub (https://github.com/s-a-nersisyan/miRGTF-net).


2019 ◽  
Author(s):  
Shaolong Cao ◽  
Zeya Wang ◽  
Fan Gao ◽  
Jingxiao Chen ◽  
Feng Zhang ◽  
...  

AbstractThe deconvolution of transcriptomic data from heterogeneous tissues in cancer studies remains challenging. Available software faces difficulties for accurately estimating both component-specific proportions and expression profiles for individual samples. To address these challenges, we present a new R-implementation pipeline for the more accurate and efficient transcriptome deconvolution of high dimensional data from mixtures of more than two components. The pipeline utilizes the computationally efficient DeMixT R-package with OpenMP and additional cancer-specific biological information to perform three-component deconvolution without requiring data from the immune profiles. It enables a wide application of DeMixT to gene expression datasets available from cancer consortium such as the Cancer Genome Atlas (TCGA) projects, where, other than the mixed tumor samples, a handful of normal samples are profiled in multiple cancer types. We have applied this pipeline to two TCGA datasets in colorectal adenocarcinoma (COAD) and prostate adenocarcinoma (PRAD). In COAD, we found varying distributions of immune proportions across the Consensus Molecular Subtypes, from the highest to the lowest being CMS1, CMS3, CMS4 and CMS2. In PRAD, we found the immune proportions are associated with progression-free survival (p<0.01) and negatively correlated with Gleason scores (p<0.001). Our DeMixT-centered analysis protocol opens up new opportunities to investigate the tumor-stroma-immune microenvironment, by providing both proportions and component-specific expressions, and thus better define the underlying biology of cancer progression.Availability and implementation: An R package, scripts and data are available: https://github.com/wwylab/DeMixTallmaterials.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Hanxiao Zhou ◽  
Yue Gao ◽  
Xin Li ◽  
Shipeng Shang ◽  
Peng Wang ◽  
...  

Abstract Background Emerging evidence has revealed that some long intergenic non-coding RNAs (lincRNAs) are likely to form clusters on the same chromosome, and lincRNA genomic clusters might play critical roles in the pathophysiological mechanism. However, the comprehensive investigation of lincRNA clustering is rarely studied, particularly the characterization of their functional significance across different cancer types. Methods In this study, we firstly constructed a computational method basing a sliding window approach for systematically identifying lincRNA genomic clusters. We then dissected these lincRNA genomic clusters to identify common characteristics in cooperative expression, conservation among divergent species, targeted miRNAs, and CNV frequency. Next, we performed comprehensive analyses in differentially-expressed patterns and overall survival outcomes for patients from The Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) across multiple cancer types. Finally, we explored the underlying mechanisms of lincRNA genomic clusters by functional enrichment analysis, pathway analysis, and drug-target interaction. Results We identified lincRNA genomic clusters according to the algorithm. Clustering lincRNAs tended to be co-expressed, highly conserved, targeted by more miRNAs, and with similar deletion and duplication frequency, suggesting that lincRNA genomic clusters may exert their effects by acting in combination. We further systematically explored conserved and cancer-specific lincRNA genomic clusters, indicating they were involved in some important mechanisms of disease occurrence through diverse approaches. Furthermore, lincRNA genomic clusters can serve as biomarkers with potential clinical significance and involve in specific pathological processes in the development of cancer. Moreover, a lincRNA genomic cluster named Cluster127 in DLK1-DIO3 imprinted locus was discovered, which contained MEG3, MEG8, MEG9, MIR381HG, LINC02285, AL132709.5, and AL132709.1. Further analysis indicated that Cluster127 may have the potential for predicting prognosis in cancer and could play their roles by participating in the regulation of PI3K-AKT signaling pathway. Conclusions Clarification of the lincRNA genomic clusters specific roles in human cancers could be beneficial for understanding the molecular pathogenesis of different cancer types.


2020 ◽  
Author(s):  
Feixiong Cheng ◽  
Junfei Zhao ◽  
Yang Wang ◽  
Weiqiang Lu ◽  
Zehui Liu ◽  
...  

AbstractTechnological and computational advances in genomics and interactomics have made it possible to identify rapidly how disease mutations perturb interaction networks within human cells. In this study, we investigate at large-scale the effects of network perturbations caused by disease mutations within the human three-dimensional (3D), structurally-resolved macromolecular interactome. We show that disease-associated germline mutations are significantly enriched in sequences encoding protein-protein interfaces compared to mutations identified in healthy subjects from the 1000 Genomes and ExAC projects; these interface mutations correspond to protein-protein interaction (PPI)-perturbing alleles including p.Ser127Arg in PCSK9 at the PCSK9-LDLR interface. In addition, somatic missense mutations are significantly enriched in PPI interfaces compared to non-interfaces in 10,861 human exomes across 33 cancer subtypes/types from The Cancer Genome Atlas. Using a binomial statistical model, we computationally identified 470 PPIs harboring a statistically significant excess number of missense mutations at protein-protein interfaces (termed putative oncoPPIs) in pan-cancer analysis. We demonstrate that the oncoPPIs, including histone H4 complex in individual cancer types, are highly correlated with patient survival and drug resistance/sensitivity in human cancer cell lines and patient-derived xenografts. We experimentally validate the network effects of 13 oncoPPIs using a systematic binary interaction assay. We further showed that ALOX5 p.Met146Lys at the ALOX5-MAD1L1 interface and RXRA p.Ser427Phe at the RXRA-PPARG interface promote significant tumor cell growth using cell line-based functional assays, providing a functional proof-of-concept. In summary, if broadly applied, this human 3D interactome network analysis offers a powerful tool for prioritizing alleles with mutations altering PPIs that may contribute to the pathobiology of human diseases, and may offer disease-specific targets for genotype-informed therapeutic discovery.


2019 ◽  
Author(s):  
Amy Li ◽  
Bjoern Chapuy ◽  
Xaralabos Varelas ◽  
Paola Sebastiani ◽  
Stefano Monti

AbstractThe emergence of large-scale multi-omics data warrants method development for data integration. Genomic studies from cancer patients have identified epigenetic and genetic regulators – such as methylation marks, somatic mutations, and somatic copy number alterations (SCNAs), among others – as predictive features of cancer outcome. However, identification of “driver genes” associated with a given alteration remains a challenge. To this end, we developed a computational tool, iEDGE, to model cis and trans effects of (epi-)DNA alterations and identify potential cis driver genes, where cis and trans genes denote those genes falling within and outside the genomic boundaries of a given (epi-)genetic alteration, respectively.First, iEDGE identifies the cis and trans genes associated with the presence/absence of a particular epi-DNA alteration across samples. Tests of statistical mediation are then performed to determine the cis genes predictive of the trans gene expression. Finally, cis and trans effects are annotated by pathway enrichment analysis to gain insights into the underlying regulatory networks.We used iEDGE to perform integrative analysis of SCNAs and gene expression data from breast cancer and 18 additional cancer types included in The Cancer Genome Atlas (TCGA). Notably, cis gene drivers identified by iEDGE were found to be significantly enriched for known driver genes from multiple compendia of validated oncogenes and tumor suppressors, suggesting that the remainder are of equal importance. Furthermore, predicted drivers were enriched for functionally relevant cancer genes with amplification-driven dependencies, which are of potential prognostic and therapeutic value. All the analyses results are accessible athttps://montilab.bu.edu/iEDGE.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Nisar Wani ◽  
Debmalya Barh ◽  
Khalid Raza

Abstract Connecting transcriptional and post-transcriptional regulatory networks solves an important puzzle in the elucidation of gene regulatory mechanisms. To decipher the complexity of these connections, we build co-expression network modules for mRNA as well as miRNA expression profiles of breast cancer data. We construct gene and miRNA co-expression modules using the weighted gene co-expression network analysis (WGCNA) method and establish the significance of these modules (Genes/miRNAs) for cancer phenotype. This work also infers an interaction network between the genes of the turquoise module from mRNA expression data and hubs of the turquoise module from miRNA expression data. A pathway enrichment analysis using a miRsystem web tool for miRNA hubs and some of their targets, reveal their enrichment in several important pathways associated with the progression of cancer.


2020 ◽  
Author(s):  
Christopher A Mancuso ◽  
Jacob L Canfield ◽  
Deepak Singla ◽  
Arjun Krishnan

AbstractWhile there are >2 million publicly-available human microarray gene-expression profiles, these profiles were measured using a variety of platforms that each cover a pre-defined, limited set of genes. Therefore, key to reanalyzing and integrating this massive data collection are methods that can computationally reconstitute the complete transcriptome in partially-measured microarray samples by imputing the expression of unmeasured genes. Current state-of-the-art imputation methods are tailored to samples from a specific platform and rely on gene-gene relationships regardless of the biological context of the target sample. We show that sparse regression models that capture sample-sample relationships (termed SampleLASSO), built on-the-fly for each new target sample to be imputed, outperform models based on fixed gene relationships. Extensive evaluation involving three machine learning algorithms (LASSO, k-nearest-neighbors, and deep-neural-networks), two gene subsets (GPL96-570 and LINCS), and three imputation tasks (within and across microarray/RNA-seq) establishes that SampleLASSO is the most accurate model. Additionally, we demonstrate the biological interpretability of this method by showing that, for imputing a target sample from a certain tissue, SampleLASSO automatically leverages training samples from the same tissue. Thus, SampleLASSO is a simple, yet powerful and flexible approach for harmonizing large-scale gene-expression data.


Sign in / Sign up

Export Citation Format

Share Document