Abstract 393: Predicting DNA accessibility in the pan-cancer tumor genome using RNA-Seq, WGS, and deep learning

Author(s):  
Kamil Wnuk ◽  
Jeremi Sudol ◽  
Shahrooz Rabizadeh ◽  
Patrick Soon-Shiong ◽  
Christopher Szeto ◽  
...  
2021 ◽  
Author(s):  
Javad Noorbakhsh ◽  
Saman Farahmand ◽  
Ali Foroughi pour ◽  
Sandeep Namburi ◽  
Dennis Caruana ◽  
...  

2020 ◽  
Vol 38 (5_suppl) ◽  
pp. 47-47
Author(s):  
Sarabjot Pabla ◽  
Erik Van Roey ◽  
Jeffrey M. Conroy ◽  
Sean Glenn ◽  
Yirong Wang ◽  
...  

47 Background: Tumor Inflammation signatures (TIS) comprising multiple immune genes have been shown to enrich for response to ICI. To study this immune phenotype in a large cohort of clinically evaluated patients, we studied gene expression data for a stable pan-cancer tumor inflammation profile and clinical response to ICI. Methods: 1323 FFPE tumors from 35 histologies were tested by RNA-seq, PD-L1 IHC and DNA-seq for TMB. Unsupervised analysis of the RNA-seq data revealed a cluster of 160 genes which separated inflamed from non-inflamed tumor microenvironments (TME). A TIS, algorithmically defined as the mean mRNA expression of the 160 genes was developed with each tumor assigned into a weak, moderate or strong inflammation group. PD-L1 IHC was performed using DAKO 22C3 antibody and considered positive if TPS ≥1%. TMB > 10 mut/Mb was considered high. The TIS, PD-L1 and TMB were independently applied to 110 NSCLC cases for association with ORR to ICIs by RECIST criterion. Results: Unsupervised clustering identified 3 inflammation clusters in the 1323 samples; inflamed (n = 439; 33.2%), borderline (n = 467; 35.3%) and non-inflamed (n = 417; 31.5%). 160 genes are over-represented by T & B-cell activation, IFNg, chemokine, cytokine and interleukin pathways. The TIS algorithm results in an inflammatory score that leads to 3 distinct groups of strong (n = 384; 29.0%), moderate (n = 354; 26.8%) and weak (n = 585; 44.2%) inflammation. Strongly inflamed tumors are over-represented by PD-L1+ tumors (240/384) whereas weakly inflamed tumors are significantly under-represented by PD-L1+ tumors (369/585; p = 1.02e-14). Strongly inflamed tumors presented with improved ORR to ICI in NSCLC (36.6%; 16/44; p = 0.051). Similar results were observed for overall survival for strongly inflamed tumors (median = 16 months; p = 0.0012) vs. weakly inflamed tumors (median = 8 months). ORR for PD-L1+ 33.96% (p = 0.026) and TMB high 21.43% (p = 0.83) were observed. Conclusions: Concurrent measurement of multiple markers led to a comprehensive, stable TIS that predicts host immune response. A strongly inflamed TIS was associated with higher ORR versus single biomarker PD-L1 and TMB in NSCLC.


2021 ◽  
pp. gr.271627.120
Author(s):  
Zhaozhao Zhao ◽  
Qiushi Xu ◽  
Ran Wei ◽  
Weixu Wang ◽  
Dong Ding ◽  
...  

Intronic polyadenylation (IpA) usually leads to changes in coding region of an mRNA, and its implication in diseases has been recognized, though at its very beginning status. Conveniently and accurately identifying IpA is of great importance for further evaluating its biological significance. Here, we developed IPAFinder, a bioinformatic method for the de novo identification of intronic poly(A) sites and their dynamic changes from standard RNA-seq data. Applying IPAFinder to 256 pan-cancer tumor/normal pairs across six tumor types, we discovered 490 recurrent dynamically changed IpA events, some of which are novel and derived from cancer-associated genes such as TSC1, SPERD2, and CCND2. Furthermore, IPAFinder revealed that IpA could be regulated by factors related to splicing and m6A modification. In summary, IPAFinder enables the global discovery and characterization of biologically regulated IpA with standard RNA-seq data and should reveal the biological significance of IpA in various processes.


2021 ◽  
Vol 7 (34) ◽  
pp. eabh1275
Author(s):  
Yu-Chiao Chiu ◽  
Siyuan Zheng ◽  
Li-Ju Wang ◽  
Brian S. Iskra ◽  
Manjeet K. Rao ◽  
...  

Genome-wide loss-of-function screens have revealed genes essential for cancer cell proliferation, called cancer dependencies. It remains challenging to link cancer dependencies to the molecular compositions of cancer cells or to unscreened cell lines and further to tumors. Here, we present DeepDEP, a deep learning model that predicts cancer dependencies using integrative genomic profiles. It uses a unique unsupervised pretraining that captures unlabeled tumor genomic representations to improve the learning of cancer dependencies. We demonstrated DeepDEP’s improvement over conventional machine learning methods and validated the performance with three independent datasets. By systematic model interpretations, we extended the current dependency maps with functional characterizations of dependencies and a proof-of-concept in silico assay of synthetic essentiality. We applied DeepDEP to pan-cancer tumor genomics and built the first pan-cancer synthetic dependency map of 8000 tumors with clinical relevance. In summary, DeepDEP is a novel tool for investigating cancer dependency with rapidly growing genomic resources.


Author(s):  
Yang Xu ◽  
Priyojit Das ◽  
Rachel Patton McCord

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.


2018 ◽  
Vol 19 (10) ◽  
pp. 3250 ◽  
Author(s):  
Anna Sorrentino ◽  
Antonio Federico ◽  
Monica Rienzo ◽  
Patrizia Gazzerro ◽  
Maurizio Bifulco ◽  
...  

The PR/SET domain gene family (PRDM) encodes 19 different transcription factors that share a subtype of the SET domain [Su(var)3-9, enhancer-of-zeste and trithorax] known as the PRDF1-RIZ (PR) homology domain. This domain, with its potential methyltransferase activity, is followed by a variable number of zinc-finger motifs, which likely mediate protein–protein, protein–RNA, or protein–DNA interactions. Intriguingly, almost all PRDM family members express different isoforms, which likely play opposite roles in oncogenesis. Remarkably, several studies have described alterations in most of the family members in malignancies. Here, to obtain a pan-cancer overview of the genomic and transcriptomic alterations of PRDM genes, we reanalyzed the Exome- and RNA-Seq public datasets available at The Cancer Genome Atlas portal. Overall, PRDM2, PRDM3/MECOM, PRDM9, PRDM16 and ZFPM2/FOG2 were the most mutated genes with pan-cancer frequencies of protein-affecting mutations higher than 1%. Moreover, we observed heterogeneity in the mutation frequencies of these genes across tumors, with cancer types also reaching a value of about 20% of mutated samples for a specific PRDM gene. Of note, ZFPM1/FOG1 mutations occurred in 50% of adrenocortical carcinoma patients and were localized in a hotspot region. These findings, together with OncodriveCLUST results, suggest it could be putatively considered a cancer driver gene in this malignancy. Finally, transcriptome analysis from RNA-Seq data of paired samples revealed that transcription of PRDMs was significantly altered in several tumors. Specifically, PRDM12 and PRDM13 were largely overexpressed in many cancers whereas PRDM16 and ZFPM2/FOG2 were often downregulated. Some of these findings were also confirmed by real-time-PCR on primary tumors.


2018 ◽  
Author(s):  
Uri Shaham

AbstractBiological measurements often contain systematic errors, also known as “batch effects”, which may invalidate downstream analysis when not handled correctly. The problem of removing batch effects is of major importance in the biological community. Despite recent advances in this direction via deep learning techniques, most current methods may not fully preserve the true biological patterns the data contains. In this work we propose a deep learning approach for batch effect removal. The crux of our approach is learning a batch-free encoding of the data, representing its intrinsic biological properties, but not batch effects. In addition, we also encode the systematic factors through a decoding mechanism and require accurate reconstruction of the data. Altogether, this allows us to fully preserve the true biological patterns represented in the data. Experimental results are reported on data obtained from two high throughput technologies, mass cytometry and single-cell RNA-seq. Beyond good performance on training data, we also observe that our system performs well on test data obtained from new patients, which was not available at training time. Our method is easy to handle, a publicly available code can be found at https://github.com/ushaham/BatchEffectRemoval2018.


2018 ◽  
Author(s):  
Matthew H. Ung ◽  
Evelien Schaafsma ◽  
Daniel E. Mattox ◽  
George L. Wang ◽  
Chao Cheng

AbstractThe “dark matter” of the genome harbors several non-coding RNA species including IncRNAs, which have been implicated in neoplasias but remain understudied. RNA-seq has provided deep insights into the nature of lncRNAs in cancer but current RNA-seq data are rarely accompanied by longitudinal patient survival information. In contrast, a plethora of microarray studies have collected these clinical metadata that can be leveraged to identify novel associations between gene expression and clinical phenotypes. In this study, we developed an analysis framework that computationally integrates RNA-seq and microarray data to systematically screen 9,463 lncRNAs for association with mortality risk across 20 cancer types. In total, we identified a comprehensive list of associations between lncRNAs and patient survival and demonstrate that these prognostic lncRNAs are under selective pressure and may be functional. Our results provide valuable insights that facilitate further exploration of lncRNAs and their potential as cancer biomarkers and drug targets.


2018 ◽  
Author(s):  
Yeping Lina Qiu ◽  
Hong Zheng ◽  
Olivier Gevaert

AbstractMotivationThe presence of missing values is a frequent problem encountered in genomic data analysis. Lost data can be an obstacle to downstream analyses that require complete data matrices. State-of-the-art imputation techniques including Singular Value Decomposition (SVD) and K-Nearest Neighbors (KNN) based methods usually achieve good performances, but are computationally expensive especially for large datasets such as those involved in pan-cancer analysis.ResultsThis study describes a new method: a denoising autoencoder with partial loss (DAPL) as a deep learning based alternative for data imputation. Results on pan-cancer gene expression data and DNA methylation data from over 11,000 samples demonstrate significant improvement over standard denoising autoencoder for both data missing-at-random cases with a range of missing percentages, and missing-not-at-random cases based on expression level and GC-content. We discuss the advantages of DAPL over traditional imputation methods and show that it achieves comparable or better performance with less computational burden.Availabilityhttps://github.com/gevaertlab/[email protected]


Sign in / Sign up

Export Citation Format

Share Document