Identification of candidate cancer drivers by integrative Epi-DNA and Gene Expression (iEDGE) data analysis

AbstractThe emergence of large-scale multi-omics data warrants method development for data integration. Genomic studies from cancer patients have identified epigenetic and genetic regulators – such as methylation marks, somatic mutations, and somatic copy number alterations (SCNAs), among others – as predictive features of cancer outcome. However, identification of “driver genes” associated with a given alteration remains a challenge. To this end, we developed a computational tool, iEDGE, to model cis and trans effects of (epi-)DNA alterations and identify potential cis driver genes, where cis and trans genes denote those genes falling within and outside the genomic boundaries of a given (epi-)genetic alteration, respectively. iEDGE first identifies the cis and trans gene expression signatures associated with the presence/absence of a particular epi-DNA alteration across samples. It then applies tests of statistical mediation to determine the cis genes predictive of the trans gene expression. Finally, cis and trans effects are annotated by pathway enrichment analysis to gain insights into the underlying regulatory networks. We used iEDGE to perform integrative analysis of SCNAs and gene expression data from breast cancer and 18 additional cancer types included in The Cancer Genome Atlas (TCGA). Notably, cis gene drivers identified by iEDGE were found to be significantly enriched for known driver genes from multiple compendia of validated oncogenes and tumor suppressors, suggesting that the remainder are of equal importance. Furthermore, predicted drivers were enriched for functionally relevant cancer genes with amplification-driven dependencies, which are of potential prognostic and therapeutic value. All the analyses results are accessible at https://montilab.bu.edu/iEDGE. In summary, integrative analysis of SCNAs and gene expression using iEDGE successfully identified known cancer driver genes and putative cancer therapeutic targets across 19 cancer types in the TCGA. The proposed method can easily be applied to the integration of gene expression profiles with other epi-DNA assays in a variety of disease contexts.

Download Full-text

iEDGE: integration of Epi-DNA and Gene Expression and applications to the discovery of somatic copy number-associated drivers in cancer

10.1101/573824 ◽

2019 ◽

Author(s):

Amy Li ◽

Bjoern Chapuy ◽

Xaralabos Varelas ◽

Paola Sebastiani ◽

Stefano Monti

Keyword(s):

Gene Expression ◽

Copy Number ◽

Regulatory Networks ◽

Large Scale ◽

Method Development ◽

The Cancer Genome Atlas ◽

Pathway Enrichment Analysis ◽

Cancer Genes ◽

Driver Genes ◽

Cis And Trans

AbstractThe emergence of large-scale multi-omics data warrants method development for data integration. Genomic studies from cancer patients have identified epigenetic and genetic regulators – such as methylation marks, somatic mutations, and somatic copy number alterations (SCNAs), among others – as predictive features of cancer outcome. However, identification of “driver genes” associated with a given alteration remains a challenge. To this end, we developed a computational tool, iEDGE, to model cis and trans effects of (epi-)DNA alterations and identify potential cis driver genes, where cis and trans genes denote those genes falling within and outside the genomic boundaries of a given (epi-)genetic alteration, respectively.First, iEDGE identifies the cis and trans genes associated with the presence/absence of a particular epi-DNA alteration across samples. Tests of statistical mediation are then performed to determine the cis genes predictive of the trans gene expression. Finally, cis and trans effects are annotated by pathway enrichment analysis to gain insights into the underlying regulatory networks.We used iEDGE to perform integrative analysis of SCNAs and gene expression data from breast cancer and 18 additional cancer types included in The Cancer Genome Atlas (TCGA). Notably, cis gene drivers identified by iEDGE were found to be significantly enriched for known driver genes from multiple compendia of validated oncogenes and tumor suppressors, suggesting that the remainder are of equal importance. Furthermore, predicted drivers were enriched for functionally relevant cancer genes with amplification-driven dependencies, which are of potential prognostic and therapeutic value. All the analyses results are accessible athttps://montilab.bu.edu/iEDGE.

Download Full-text

Integrating Genetic and Transcriptomic Data to Reveal Pathogenesis and Prognostic Markers of Pancreatic Adenocarcinoma

Frontiers in Genetics ◽

10.3389/fgene.2021.747270 ◽

2021 ◽

Vol 12 ◽

Author(s):

Kaisong Bai ◽

Tong Zhao ◽

Yilong Li ◽

Xinjian Li ◽

Zhantian Zhang ◽

...

Keyword(s):

Gene Expression ◽

Pancreatic Adenocarcinoma ◽

Somatic Mutation ◽

Somatic Mutations ◽

Prognostic Markers ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Genomic Variation ◽

The Cancer Genome Atlas ◽

Driver Genes

Pancreatic adenocarcinoma (PAAD) is one of the deadliest malignancies and mortality for PAAD have remained increasing under the conditions of substantial improvements in mortality for other major cancers. Although multiple of studies exists on PAAD, few studies have dissected the oncogenic mechanisms of PAAD based on genomic variation. In this study, we integrated somatic mutation data and gene expression profiles obtained by high-throughput sequencing to characterize the pathogenesis of PAAD. The mutation profile containing 182 samples with 25,470 somatic mutations was obtained from The Cancer Genome Atlas (TCGA). The mutation landscape was generated and somatic mutations in PAAD were found to have preference for mutation location. The combination of mutation matrix and gene expression profiles identified 31 driver genes that were closely associated with tumor cell invasion and apoptosis. Co-expression networks were constructed based on 461 genes significantly associated with driver genes and the hub gene FAM133A in the network was identified to be associated with tumor metastasis. Further, the cascade relationship of somatic mutation-Long non-coding RNA (lncRNA)-microRNA (miRNA) was constructed to reveal a new mechanism for the involvement of mutations in post-transcriptional regulation. We have also identified prognostic markers that are significantly associated with overall survival (OS) of PAAD patients and constructed a risk score model to identify patients’ survival risk. In summary, our study revealed the pathogenic mechanisms and prognostic markers of PAAD providing theoretical support for the development of precision medicine.

Download Full-text

In Silico Discovery of Candidate Drugs against Covid-19

Viruses ◽

10.3390/v12040404 ◽

2020 ◽

Vol 12 (4) ◽

pp. 404 ◽

Cited By ~ 42

Author(s):

Claudia Cava ◽

Gloria Bertoli ◽

Isabella Castiglioni

Keyword(s):

Gene Expression ◽

In Silico ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Interaction Network ◽

Enrichment Analysis ◽

Basic Mechanism ◽

Cell Receptor ◽

The Cancer Genome Atlas ◽

Pathway Enrichment Analysis

Previous studies reported that Angiotensin converting enzyme 2 (ACE2) is the main cell receptor of SARS-CoV and SARS-CoV-2. It plays a key role in the access of the virus into the cell to produce the final infection. In the present study we investigated in silico the basic mechanism of ACE2 in the lung and provided evidences for new potentially effective drugs for Covid-19. Specifically, we used the gene expression profiles from public datasets including The Cancer Genome Atlas, Gene Expression Omnibus and Genotype-Tissue Expression, Gene Ontology and pathway enrichment analysis to investigate the main functions of ACE2-correlated genes. We constructed a protein-protein interaction network containing the genes co-expressed with ACE2. Finally, we focused on the genes in the network that are already associated with known drugs and evaluated their role for a potential treatment of Covid-19. Our results demonstrate that the genes correlated with ACE2 are mainly enriched in the sterol biosynthetic process, Aryldialkylphosphatase activity, adenosylhomocysteinase activity, trialkylsulfonium hydrolase activity, acetate-CoA and CoA ligase activity. We identified a network of 193 genes, 222 interactions and 36 potential drugs that could have a crucial role. Among possible interesting drugs for Covid-19 treatment, we found Nimesulide, Fluticasone Propionate, Thiabendazole, Photofrin, Didanosine and Flutamide.

Download Full-text

Single-cell analysis of a tumor-derived exosome signature correlates with prognosis and immunotherapy response

Journal of Translational Medicine ◽

10.1186/s12967-021-03053-4 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Jiani Wu ◽

Dongqiang Zeng ◽

Shimeng Zhi ◽

Zilan Ye ◽

Wenjun Qiu ◽

...

Keyword(s):

Gene Expression ◽

Overall Survival ◽

Single Cell ◽

Immune Modulation ◽

Expression Profiles ◽

Predictive Biomarkers ◽

Disease Diagnosis ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

Cancer Types

Abstract Background Tumor-derived exosomes (TEXs) are involved in tumor progression and the immune modulation process and mediate intercellular communication in the tumor microenvironment. Although exosomes are considered promising liquid biomarkers for disease diagnosis, it is difficult to discriminate TEXs and to develop TEX-based predictive biomarkers. Methods In this study, the gene expression profiles and clinical information were collected from The Cancer Genome Atlas (TCGA) database, IMvigor210 cohorts, and six independent Gene Expression Omnibus datasets. A TEXs-associated signature named TEXscore was established to predict overall survival in multiple cancer types and in patients undergoing immune checkpoint blockade therapies. Results Based on exosome-associated genes, we first constructed a tumor-derived exosome signature named TEXscore using a principal component analysis algorithm. In single-cell RNA-sequencing data analysis, ascending TEXscore was associated with disease progression and poor clinical outcomes. In the TCGA Pan-Cancer cohort, TEXscore was elevated in tumor samples rather than in normal tissues, thereby serving as a reliable biomarker to distinguish cancer from non-cancer sources. Moreover, high TEXscore was associated with shorter overall survival across 12 cancer types. TEXscore showed great potential in predicting immunotherapy response in melanoma, urothelial cancer, and renal cancer. The immunosuppressive microenvironment characterized by macrophages, cancer-associated fibroblasts, and myeloid-derived suppressor cells was associated with high TEXscore in the TCGA and immunotherapy cohorts. Besides, TEXscore-associated miRNAs and gene mutations were also identified. Further experimental research will facilitate the extending of TEXscore in tumor-associated exosomes. Conclusions TEXscore capturing tumor-derived exosome features might be a robust biomarker for prognosis and treatment responses in independent cohorts.

Download Full-text

Screening of gene markers related to the prognosis of metastatic skin cutaneous melanoma based on Logit regression and survival analysis

BMC Medical Genomics ◽

10.1186/s12920-021-00923-0 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Guoliang Jia ◽

Zheyu Song ◽

Zhonghang Xu ◽

Youmao Tao ◽

Yuanyu Wu ◽

...

Keyword(s):

Gene Expression ◽

Cutaneous Melanoma ◽

Expression Profiles ◽

The Cancer Genome Atlas ◽

Training Dataset ◽

Validation Dataset ◽

Survival Prognosis ◽

Gene Markers ◽

Clinical Prognosis ◽

Logit Regression

Abstract Background Bioinformatics was used to analyze the skin cutaneous melanoma (SKCM) gene expression profile to provide a theoretical basis for further studying the mechanism underlying metastatic SKCM and the clinical prognosis. Methods We downloaded the gene expression profiles of 358 metastatic and 102 primary (nonmetastatic) CM samples from The Cancer Genome Atlas (TCGA) database as a training dataset and the GSE65904 dataset from the National Center for Biotechnology Information database as a validation dataset. Differentially expressed genes (DEGs) were screened using the limma package of R3.4.1, and prognosis-related feature DEGs were screened using Logit regression (LR) and survival analyses. We also used the STRING online database, Cytoscape software, and Database for Annotation, Visualization and Integrated Discovery software for protein–protein interaction network, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses based on the screened DEGs. Results Of the 876 DEGs selected, 11 (ZNF750, NLRP6, TGM3, KRTDAP, CAMSAP3, KRT6C, CALML5, SPRR2E, CD3G, RTP5, and FAM83C) were screened using LR analysis. The survival prognosis of nonmetastatic group was better compared to the metastatic group between the TCGA training and validation datasets. The 11 DEGs were involved in 9 KEGG signaling pathways, and of these 11 DEGs, CALML5 was a feature DEG involved in the melanogenesis pathway, 12 targets of which were collected. Conclusion The feature DEGs screened, such as CALML5, are related to the prognosis of metastatic CM according to LR. Our results provide new ideas for exploring the molecular mechanism underlying CM metastasis and finding new diagnostic prognostic markers.

Download Full-text

Pan-Cancer Transcriptional Models Predicting Chemosensitivity in Human Tumors

Cancer Informatics ◽

10.1177/11769351211002494 ◽

2021 ◽

Vol 20 ◽

pp. 117693512110024

Author(s):

Jason D Wells ◽

Jacqueline R Griffin ◽

Todd W Miller

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Human Tumors ◽

Molecular Characteristics ◽

Survival Times ◽

Chemotherapy Drugs ◽

Testing Dataset ◽

Cancer Models ◽

Cancer Types ◽

Pan Cancer

Motivation: Despite increasing understanding of the molecular characteristics of cancer, chemotherapy success rates remain low for many cancer types. Studies have attempted to identify patient and tumor characteristics that predict sensitivity or resistance to different types of conventional chemotherapies, yet a concise model that predicts chemosensitivity based on gene expression profiles across cancer types remains to be formulated. We attempted to generate pan-cancer models predictive of chemosensitivity and chemoresistance. Such models may increase the likelihood of identifying the type of chemotherapy most likely to be effective for a given patient based on the overall gene expression of their tumor. Results: Gene expression and drug sensitivity data from solid tumor cell lines were used to build predictive models for 11 individual chemotherapy drugs. Models were validated using datasets from solid tumors from patients. For all drug models, accuracy ranged from 0.81 to 0.93 when applied to all relevant cancer types in the testing dataset. When considering how well the models predicted chemosensitivity or chemoresistance within individual cancer types in the testing dataset, accuracy was as high as 0.98. Cell line–derived pan-cancer models were able to statistically significantly predict sensitivity in human tumors in some instances; for example, a pan-cancer model predicting sensitivity in patients with bladder cancer treated with cisplatin was able to significantly segregate sensitive and resistant patients based on recurrence-free survival times ( P = .048) and in patients with pancreatic cancer treated with gemcitabine ( P = .038). These models can predict chemosensitivity and chemoresistance across cancer types with clinically useful levels of accuracy.

Download Full-text

BART Cancer: a web resource for transcriptional regulators in cancer genomes

NAR Cancer ◽

10.1093/narcan/zcab011 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Zachary V Thomas ◽

Zhenjia Wang ◽

Chongzhi Zang

Keyword(s):

Gene Expression ◽

Cancer Research ◽

Computational Prediction ◽

Transcriptional Regulators ◽

Gene Expression Omnibus ◽

The Cancer Genome Atlas ◽

Regulation Of Transcription ◽

Data Resource ◽

Web Resource ◽

Cancer Types

Abstract Dysregulation of gene expression plays an important role in cancer development. Identifying transcriptional regulators, including transcription factors and chromatin regulators, that drive the oncogenic gene expression program is a critical task in cancer research. Genomic profiles of active transcriptional regulators from primary cancer samples are limited in the public domain. Here we present BART Cancer (bartcancer.org), an interactive web resource database to display the putative transcriptional regulators that are responsible for differentially regulated genes in 15 different cancer types in The Cancer Genome Atlas (TCGA). BART Cancer integrates over 10000 gene expression profiling RNA-seq datasets from TCGA with over 7000 ChIP-seq datasets from the Cistrome Data Browser database and the Gene Expression Omnibus (GEO). BART Cancer uses Binding Analysis for Regulation of Transcription (BART) for predicting the transcriptional regulators from the differentially expressed genes in cancer samples compared to normal samples. BART Cancer also displays the activities of over 900 transcriptional regulators across cancer types, by integrating computational prediction results from BART and the Cistrome Cancer database. Focusing on transcriptional regulator activities in human cancers, BART Cancer can provide unique insights into epigenetics and transcriptional regulation in cancer, and is a useful data resource for genomics and cancer research communities.

Download Full-text

Identification of CCNB2 expression in triple-negative breast cancer based on bioinformatics results

10.21203/rs.3.rs-506326/v1 ◽

2021 ◽

Author(s):

jintao cao ◽

SHUAI SUN ◽

RAN LI ◽

RUI MIN ◽

XINGYU FAN ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Triple Negative Breast Cancer ◽

Protein Complex ◽

Triple Negative ◽

Expression Profiles ◽

Pathway Enrichment Analysis ◽

Analysis Tool ◽

The Core ◽

Core Genes

Abstract Background The current epidemiology shows that the incidence of breast cancer is increasing year by year and tends to be younger. Triple-negative breast cancer is the most malignant of breast cancer subtypes. The application of bioinformatics in tumor research is becoming more and more extensive. This study provided research ideas and basis for exploring the potential targets of gene therapy for triple-negative breast cancer (TNBC). Methods We analyzed three gene expression profiles (GSE64790、GSE62931、GSE38959) selected from the Gene Expression Omnibus (GEO) database. The GEO2R online analysis tool was used to screen for differentially expressed genes (DEGs) between TNBC and normal tissues. Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were applied to identify the pathways and functional annotation of DEGs. Protein–protein interaction network of these DEGs were visualized by the Metascape gene-list analysis tool so that we could find the protein complex containing the core genes. Subsequently, we investigated the transcriptional data of the core genes in patients with breast cancer from the Oncomine database. Moreover, the online Kaplan–Meier plotter survival analysis tool was used to evaluate the prognostic value of core genes expression in TNBC patients. Finally, immunohistochemistry (IHC) was used to evaluated the expression level and subcellular localization of CCNB2 on TNBC tissues. Results A total of 66 DEGs were identified, including 33 up-regulated genes and 33 down-regulated genes. Among them, a potential protein complex containing five core genes was screened out. The high expression of these core genes was correlated to the poor prognosis of patients suffering breast cancer, especially the overexpression of CCNB2. CCNB2 protein positively expressed in the cytoplasm, and its expression in triple-negative breast cancer tissues was significantly higher than that in adjacent tissues. Conclusions CCNB2 may play a crucial role in the development of TNBC and has the potential as a prognostic biomarker of TNBC.

Download Full-text

90-Gene Expression Profiling for Tissue Origin Diagnosis of Cancer of Unknown Primary

Frontiers in Oncology ◽

10.3389/fonc.2021.722808 ◽

2021 ◽

Vol 11 ◽

Author(s):

Yi Zhang ◽

Lei Xia ◽

Dawei Ma ◽

Jing Wu ◽

Xinyu Xu ◽

...

Keyword(s):

Gene Expression ◽

Malignant Tumors ◽

Expression Profiles ◽

Cancer Of Unknown Primary ◽

The Cancer Genome Atlas ◽

Unknown Primary ◽

Gene Expression Assay ◽

Primary Location ◽

Cancer Genome Atlas ◽

Expression Characteristics

Cancer of unknown primary (CUP), in which metastatic diseases exist without an identifiable primary location, accounts for about 3–5% of all cancer diagnoses. Successful diagnosis and treatment of such patients are difficult. This study aimed to assess the expression characteristics of 90 genes as a method of identifying the primary site from CUP samples. We validated a 90-gene expression assay and explored its potential diagnostic utility in 44 patients at Jiangsu Cancer Hospital. For each specimen, the expression of 90 tumor-specific genes in malignant tumors was analyzed, and similarity scores were obtained. The types of malignant tumors predicted were compared with the reference diagnosis to calculate the accuracy. In addition, we verified the consistency of the expression profiles of the 90 genes in CUP secondary malignancies and metastatic malignancies in The Cancer Genome Atlas. We also reported a detailed description of the next-generation coding sequences for CUP patients. For each clinical medical specimen collected, the type of malignant tumor predicted and analyzed by the 90-gene expression assay was compared with its reference diagnosis, and the overall accuracy was 95.4%. In addition, the 90-gene expression profile generally accurately classified CUP into the cluster of its primary tumor. Sequencing of the exome transcriptome containing 556 high-frequency gene mutation oncogenes was not significantly related to the 90 genes analysis. Our results demonstrate that the expression characteristics of these 90 genes can be used as a powerful tool to accurately identify the primary sites of CUP. In the future, the inclusion of the 90-gene expression assay in pathological diagnosis will help oncologists use precise treatments, thereby improving the care and outcomes of CUP patients.

Download Full-text

Exploration and validation of a novel prognostic signature based on comprehensive bioinformatics analysis in hepatocellular carcinoma

Bioscience Reports ◽

10.1042/bsr20203263 ◽

2020 ◽

Vol 40 (11) ◽

Author(s):

Xiaofei Wang ◽

Jie Qiao ◽

Rongqi Wang

Keyword(s):

Gene Expression ◽

Hepatocellular Carcinoma ◽

Overall Survival ◽

Risk Score ◽

Expression Profiles ◽

Univariate Analysis ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Good Prediction ◽

Expression Levels

Abstract The present study aimed to construct a novel signature for indicating the prognostic outcomes of hepatocellular carcinoma (HCC). Gene expression profiles were downloaded from Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) databases. The prognosis-related genes with differential expression were identified with weighted gene co-expression network analysis (WGCNA), univariate analysis, the least absolute shrinkage and selection operator (LASSO). With the stepwise regression analysis, a risk score was constructed based on the expression levels of five genes: Risk score = (−0.7736* CCNB2) + (1.0083* DYNC1LI1) + (−0.6755* KIF11) + (0.9588* SPC25) + (1.5237* KIF18A), which can be applied as a signature for predicting the prognosis of HCC patients. The prediction capacity of the risk score for overall survival was validated with both TCGA and ICGC cohorts. The 1-, 3- and 5-year ROC curves were plotted, in which the AUC was 0.842, 0.726 and 0.699 in TCGA cohort and 0.734, 0.691 and 0.700 in ICGC cohort, respectively. Moreover, the expression levels of the five genes were determined in clinical tumor and normal specimens with immunohistochemistry. The novel signature has exhibited good prediction efficacy for the overall survival of HCC patients.

Download Full-text