scholarly journals Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marleen M. Nieboer ◽  
Luan Nguyen ◽  
Jeroen de Ridder

AbstractOver the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.

2021 ◽  
Vol 4 (9) ◽  
pp. e201900523
Author(s):  
Husen M Umer ◽  
Karolina Smolinska ◽  
Jan Komorowski ◽  
Claes Wadelius

In a cancer genome, the noncoding sequence contains the vast majority of somatic mutations. While very few are expected to be cancer drivers, those affecting regulatory elements have the potential to have downstream effects on gene regulation that may contribute to cancer progression. To prioritize regulatory mutations, we screened somatic mutations in the Pan-Cancer Analysis of Whole Genomes cohort of 2,515 cancer genomes on individual bases to assess their potential regulatory roles in their respective cancer types. We found a highly significant enrichment of regulatory mutations associated with the deamination signature overlapping a CpG site in the CCAAT/Enhancer Binding Protein β recognition sites in many cancer types. Overall, 5,749 mutated regulatory elements were identified in 1,844 tumor samples from 39 cohorts containing 11,962 candidate regulatory mutations. Our analysis indicated 20 or more regulatory mutations in 5.5% of the samples, and an overall average of six per tumor. Several recurrent elements were identified, and major cancer-related pathways were significantly enriched for genes nearby the mutated regulatory elements. Our results provide a detailed view of the role of regulatory elements in cancer genomes.


2021 ◽  
Author(s):  
Yongxing Du ◽  
Zongting Gu ◽  
Zongze Li ◽  
Zan Yuan ◽  
Yue Zhao ◽  
...  

Structural variations (SVs) are the greatest source of variation in the genome and can lead to oncogenesis. However, the identification and interpretation of SVs in human pancreatic cancer remain largely undefined due to technological limitations. Here, we investigate the spectrum of SVs and three-dimensional (3D) chromatin architecture in human pancreatic ductal epithelial cell carcinogenesis by using state-of-the-art long-read single-molecule real-time (SMRT) and high-throughput chromosome conformation capture (Hi-C) sequencing techniques. We find that the 3D genome organization is remodeled and correlated with gene expressional change. The bulk remodeling effect of cross-boundary SVs in the 3D genome partly depends on intercellular genomic heterogeneity. Meanwhile, contact domains tend to minimize these disrupting effects of SVs within local adjacent genomic regions to maintain overall stability of 3D genome organization. Moreover, our data also demonstrates complex genomic rearrangements involving two key driver genes CDKN2A and SMAD4, and elucidates their influence on cancer-related gene expression from both linear view and 3D perspective. Overall, this study provides a valuable resource and highlights the impact, complexity and dynamicity of the interplay between SVs and 3D genome organization, which further expands our understanding of pathogenesis of SVs in human pancreatic cancer.


2020 ◽  
Author(s):  
Gulden Olgun ◽  
Oznur Tastan

AbstractThe dysregulation of long non-coding RNAs’ (lncRNAs) expressions has been implicated in cancer. Since most of the lncRNAs’ are not functionally characterized well, investigating the set of perturbed lncRNAs are is challenging. Existing methods that inspect lncRNAs functionally rely on the co-expressed coding genes, which are far better characterized functionally. LncRNAs can be known to act as transcriptional regulators; they may activate or repress the neighborhood’s coding genes on the genome. Based on this, in this work, we aim to analyze the deregulated lncRNAs in cancer by taking into account their ability to regulate nearby loci on the genome. We perform functional analysis on differentially expressed lncRNAs for 28 different cancers considering their adjacent coding genes. We identify that some deregulated lncRNAs are cancer-specific, but a substantial number of lncRNAs are shared across cancers. Next, we assess the similarities of the cancer types based on the functional enrichment of the deregulated lncRNA sets. We find some cancers are very similar in the functions and biological processes related to the deregulated lncRNAs. We observe that some of the cancers for which we find similarity can be linked through primary, metastatic site relations. We investigate the similarity of enriched functional terms for the deregulated lncRNAs and the mRNAs. We further assess the enriched functions’ similarity to the functions and processes that the known cancer driver genes take place. We believe that our methodology help to understand the impact of the lncRNAs in cancer functionally.


2019 ◽  
Author(s):  
Tuan Trieu ◽  
Ekta Khurana

Three-dimensional structures of the genome play an important role in regulating the expression of genes. Non-coding variants have been shown to alter 3D genome structures to activate oncogenes in cancer. However, there is currently no method to predict the effect of DNA variants on 3D structures. We propose a deep learning method, DeepMILO, to learn DNA sequence features of CTCF/cohesin-mediated loops and to predict the effect of variants on these loops. DeepMILO consists of a convolutional and a recurrent neural network, and it can learn features beyond the presence of CTCF motifs and their orientations. Application of DeepMILO on a cohort of 241 malignant lymphoma patients with whole-genome sequences revealed CTCF/cohesin-mediated loops disrupted in multiple patients. These disrupted loops contain known cancer driver genes and novel genes. Our results show mutations at loop boundaries are associated with upregulation of the cancer driver gene BCL2 and may point to a possible new mechanism for its dysregulation via alteration of 3D loop structures.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tejaswi Iyyanki ◽  
Baozhen Zhang ◽  
Qixuan Wang ◽  
Ye Hou ◽  
Qiushi Jin ◽  
...  

Abstract Muscle-invasive bladder cancers are characterized by their distinct expression of luminal and basal genes, which could be used to predict key clinical features such as disease progression and overall survival. Transcriptionally, FOXA1, GATA3, and PPARG are shown to be essential for luminal subtype-specific gene regulation and subtype switching, while TP63, STAT3, and TFAP2 family members are critical for regulation of basal subtype-specific genes. Despite these advances, the underlying epigenetic mechanisms and 3D chromatin architecture responsible for subtype-specific regulation in bladder cancer remain unknown. Result We determine the genome-wide transcriptome, enhancer landscape, and transcription factor binding profiles of FOXA1 and GATA3 in luminal and basal subtypes of bladder cancer. Furthermore, we report the first-ever mapping of genome-wide chromatin interactions by Hi-C in both bladder cancer cell lines and primary patient tumors. We show that subtype-specific transcription is accompanied by specific open chromatin and epigenomic marks, at least partially driven by distinct transcription factor binding at distal enhancers of luminal and basal bladder cancers. Finally, we identify a novel clinically relevant transcription factor, Neuronal PAS Domain Protein 2 (NPAS2), in luminal bladder cancers that regulates other subtype-specific genes and influences cancer cell proliferation and migration. Conclusion In summary, our work identifies unique epigenomic signatures and 3D genome structures in luminal and basal urinary bladder cancers and suggests a novel link between the circadian transcription factor NPAS2 and a clinical bladder cancer subtype.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ege Ülgen ◽  
O. Uğur Sezerman

Abstract Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR.


2021 ◽  
pp. 153537022110312
Author(s):  
Kenneth S Ramos ◽  
Pasano Bojang ◽  
Emma Bowers

LINE-1 retrotransposon, the most active mobile element of the human genome, is subject to tight regulatory control. Stressful environments and disease modify the recruitment of regulatory proteins leading to unregulated activation of LINE-1. The activation of LINE-1 influences genome dynamics through altered chromatin landscapes, insertion mutations, deletions, and modulation of cellular plasticity. To date, LINE-1 retrotransposition has been linked to various cancer types and may in fact underwrite the genetic basis of various other forms of chronic human illness. The occurrence of LINE-1 polymorphisms in the human population may define inter-individual differences in susceptibility to disease. This review is written in honor of Dr Peter Stambrook, a friend and colleague who carried out highly impactful cancer research over many years of professional practice. Dr Stambrook devoted considerable energy to helping others live up to their full potential and to navigate the complexities of professional life. He was an inspirational leader, a strong advocate, a kind mentor, a vocal supporter and cheerleader, and yes, a hard critic and tough friend when needed. His passionate stand on issues, his witty sense of humor, and his love for humanity have left a huge mark in our lives. We hope that that the knowledge summarized here will advance our understanding of the role of LINE-1 in cancer biology and expedite the development of innovative cancer diagnostics and treatments in the ways that Dr Stambrook himself had so passionately envisioned.


2020 ◽  
Vol 49 (D1) ◽  
pp. D38-D46
Author(s):  
Kyukwang Kim ◽  
Insu Jang ◽  
Mooyoung Kim ◽  
Jinhyuk Choi ◽  
Min-Seo Kim ◽  
...  

Abstract Three-dimensional (3D) genome organization is tightly coupled with gene regulation in various biological processes and diseases. In cancer, various types of large-scale genomic rearrangements can disrupt the 3D genome, leading to oncogenic gene expression. However, unraveling the pathogenicity of the 3D cancer genome remains a challenge since closer examinations have been greatly limited due to the lack of appropriate tools specialized for disorganized higher-order chromatin structure. Here, we updated a 3D-genome Interaction Viewer and database named 3DIV by uniformly processing ∼230 billion raw Hi-C reads to expand our contents to the 3D cancer genome. The updates of 3DIV are listed as follows: (i) the collection of 401 samples including 220 cancer cell line/tumor Hi-C data, 153 normal cell line/tissue Hi-C data, and 28 promoter capture Hi-C data, (ii) the live interactive manipulation of the 3D cancer genome to simulate the impact of structural variations and (iii) the reconstruction of Hi-C contact maps by user-defined chromosome order to investigate the 3D genome of the complex genomic rearrangement. In summary, the updated 3DIV will be the most comprehensive resource to explore the gene regulatory effects of both the normal and cancer 3D genome. ‘3DIV’ is freely available at http://3div.kr.


2021 ◽  
Vol 11 (13) ◽  
pp. 5895
Author(s):  
Kristina Serec ◽  
Sanja Dolanski Babić

The double-stranded B-form and A-form have long been considered the two most important native forms of DNA, each with its own distinct biological roles and hence the focus of many areas of study, from cellular functions to cancer diagnostics and drug treatment. Due to the heterogeneity and sensitivity of the secondary structure of DNA, there is a need for tools capable of a rapid and reliable quantification of DNA conformation in diverse environments. In this work, the second paper in the series that addresses conformational transitions in DNA thin films utilizing FTIR spectroscopy, we exploit popular chemometric methods: the principal component analysis (PCA), support vector machine (SVM) learning algorithm, and principal component regression (PCR), in order to quantify and categorize DNA conformation in thin films of different hydrated states. By complementing FTIR technique with multivariate statistical methods, we demonstrate the ability of our sample preparation and automated spectral analysis protocol to rapidly and efficiently determine conformation in DNA thin films based on the vibrational signatures in the 1800–935 cm−1 range. Furthermore, we assess the impact of small hydration-related changes in FTIR spectra on automated DNA conformation detection and how to avoid discrepancies by careful sampling.


Endocrinology ◽  
2021 ◽  
Author(s):  
Chenghao Zhu ◽  
Paul C Boutros

Abstract Cancer is a leading cause of death worldwide. Sex influences cancer in a bewildering variety of ways. In some cancer types it affects prevalence, in others genomic profiles, or response to treatment, or mortality. In some sex seems to have little or no influence. How and when sex influences cancer initiation and progression remain a critical gap in our understanding of cancer, with direct relevance to precision medicine. Here, we note several factors that complicate our understanding of sex differences: representativeness of large cohorts, confounding with features like ancestry, age and obesity, and variability in clinical presentation. We summarize the key resources available to study molecular sex differences, and suggest some likely directions for improving our understanding of how patient sex influences cancer behaviour.


Sign in / Sign up

Export Citation Format

Share Document