scholarly journals Cancer mutational signatures representation by large-scale context embedding

2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i309-i316 ◽  
Author(s):  
Yang Zhang ◽  
Yunxuan Xiao ◽  
Muyu Yang ◽  
Jian Ma

Abstract Motivation The accumulation of somatic mutations plays critical roles in cancer development and progression. However, the global patterns of somatic mutations, especially non-coding mutations, and their roles in defining molecular subtypes of cancer have not been well characterized due to the computational challenges in analysing the complex mutational patterns. Results Here, we develop a new algorithm, called MutSpace, to effectively extract patient-specific mutational features using an embedding framework for larger sequence context. Our method is motivated by the observation that the mutation rate at megabase scale and the local mutational patterns jointly contribute to distinguishing cancer subtypes, both of which can be simultaneously captured by MutSpace. Simulation evaluations show that MutSpace can effectively characterize mutational features from known patient subgroups and achieve superior performance compared with previous methods. As a proof-of-principle, we apply MutSpace to 560 breast cancer patient samples and demonstrate that our method achieves high accuracy in subtype identification. In addition, the learned embeddings from MutSpace reflect intrinsic patterns of breast cancer subtypes and other features of genome structure and function. MutSpace is a promising new framework to better understand cancer heterogeneity based on somatic mutations. Availability and implementation Source code of MutSpace can be accessed at: https://github.com/ma-compbio/MutSpace. Supplementary information Supplementary data are available at Bioinformatics online.

2012 ◽  
Vol 30 (15_suppl) ◽  
pp. 503-503
Author(s):  
Matthew James Ellis ◽  
Li Ding ◽  
Dong Shen ◽  
Jingqin Luo ◽  
Vera J. Suman ◽  
...  

503 Background: To correlate clinical features of estrogen receptor positive breast cancer with somatic mutations, massively parallel sequencing (MPS) was applied to tumor and normal DNAs accrued from patients treated with neoadjuvant aromatase inhibitors (AI). Methods: MPS was applied to 77 baseline tumor biopsy samples from the preoperative letrozole trial (JACS 2009: 208, 906) and the Z1031 trial (JCO 2011: 29, 2342) followed by targeted sequencing in another 240 trial samples. Standard statistical approaches were used to compare mutation status and clinical parameters and pathway-based correlation was used to assess interactions between signaling perturbations induced by gene mutations and response to neoadjuvant AI. Results: Eighteen genes were significantly mutated above background. Aside from PIK3CA mutations, the list is dominated by loss-of-function mutations in tumor suppressor genes. Five (RUNX1, CBFP, MYH9, MLL3 and SF3B1) have been previously linked to benign and malignant hematopoietic disorders. Clinical correlation revealed that TP53 mutation was associated with PAM50 LumB status, high-grade histology and high proliferation rates whereas loss-of-function mutations in MAP3K1 associate with PAM50 lumA status, low proliferation rates, and low grade histology. Mutations in GATA3 were associated with greater suppression of proliferation upon AI treatment suggesting mutGATA3 may predict endocrine response. Notably, mutations in MAP3K1 were more common in PIK3CA mutant cases, suggesting cooperation. Pathway analysis demonstrated that rare MAP2K4 mutations produce similar pathway perturbations as MAP3K1 mutation, a logical finding since MAP2K4 is a substrate for MAP3K1. Signaling network patterns driven by lncRNA MALAT1 mutations were associated with multiple poor clinical outcome features. Rare mutations in druggable kinases included two in the kinase domain of HER2. Conclusions: Tumor heterogeneity in luminal-type breast cancer is driven by specific patterns of somatic mutations, however most druggable or potentially prognostic mutations are infrequent. Prospective clinical trials based on these findings will require comprehensive genome sequencing approaches and large scale investigations.


Author(s):  
Carmen Moccia ◽  
Kristina Haase

Breast cancer is the second leading cause of death among women worldwide, and while hormone receptor positive subtypes have a clear and effective treatment strategy, other subtypes, such as triple negative breast cancers, do not. Development of new drugs, antibodies, or immune targets requires significant re-consideration of current preclinical models, which frequently fail to mimic the nuances of patient-specific breast cancer subtypes. Each subtype, together with the expression of different markers, genetic and epigenetic profiles, presents a unique tumor microenvironment, which promotes tumor development and progression. For this reason, personalized treatments targeting components of the tumor microenvironment have been proposed to mitigate breast cancer progression, particularly for aggressive triple negative subtypes. To-date, animal models remain the gold standard for examining new therapeutic targets; however, there is room for in vitro tools to bridge the biological gap with humans. Tumor-on-chip technologies allow for precise control and examination of the tumor microenvironment and may add to the toolbox of current preclinical models. These new models include key aspects of the tumor microenvironment (stroma, vasculature and immune cells) which have been employed to understand metastases, multi-organ interactions, and, importantly, to evaluate drug efficacy and toxicity in humanized physiologic systems. This review provides insight into advanced in vitro tumor models specific to breast cancer, and discusses their potential and limitations for use as future preclinical patient-specific tools.


Author(s):  
Kai Zheng ◽  
Zhu-Hong You ◽  
Lei Wang ◽  
Leon Wong ◽  
Zhan-Heng Chen ◽  
...  

ABSTRACTMotivationPIWI proteins and Piwi-Interacting RNAs (piRNAs) are commonly detected in human cancers, especially in germline and somatic tissues, and correlates with poorer clinical outcomes, suggesting that they play a functional role in cancer. As the problem of combinatorial explosions between ncRNA and disease exposes out gradually, new bioinformatics methods for large-scale identification and prioritization of potential associations are therefore of interest. However, in the real world, the network of interactions between molecules is enormously intricate and noisy, which poses a problem for efficient graph mining. This study aims to make preliminary attempts on bionetwork based graph mining.ResultsIn this study, we present a method based on graph attention network to identify potential and biologically significant piRNA-disease associations (PDAs), called GAPDA. The attention mechanism can calculate a hidden representation of an association in the network based on neighbor nodes and assign weights to the input to make decisions. In particular, we introduced the attention-based Graph Neural Networks to the field of bio-association prediction for the first time, and proposed an abstract network topology suitable for small samples. Specifically, we combined piRNA sequence information and disease semantic similarity with piRNA-disease association network to construct a new attribute network. In the experiment, GAPDA performed excellently in five-fold cross-validation with the AUC of 0.9038. Not only that, but it still has superior performance compared to methods based on collaborative filtering and attribute features. The experimental results show that GAPDA ensures the prospect of the graph neural network on such problems and can be an excellent supplement for future biomedical [email protected];[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3749-3757 ◽  
Author(s):  
Wei Zheng ◽  
Xiaogen Zhou ◽  
Qiqige Wuyun ◽  
Robin Pearce ◽  
Yang Li ◽  
...  

Abstract Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (4) ◽  
pp. 994-999
Author(s):  
Qian Zhu ◽  
Xavier Tekpli ◽  
Olga G Troyanskaya ◽  
Vessela N Kristensen

Abstract Motivation Breast cancer consists of multiple distinct tumor subtypes, and results from epigenetic and genetic aberrations that give rise to distinct transcriptional profiles. Despite previous efforts to understand transcriptional deregulation through transcription factor networks, the transcriptional mechanisms leading to subtypes of the disease remain poorly understood. Results We used a sophisticated computational search of thousands of expression datasets to define extended signatures of distinct breast cancer subtypes. Using ENCODE ChIP-seq data of surrogate cell lines and motif analysis we observed that these subtypes are determined by a distinct repertoire of lineage-specific transcription factors. Furthermore, specific pattern and abundance of copy number and DNA methylation changes at these TFs and targets, compared to other genes and to normal cells were observed. Overall, distinct transcriptional profiles are linked to genetic and epigenetic alterations at lineage-specific transcriptional regulators in breast cancer subtypes. Availability and implementation The analysis code and data are deposited at https://bitbucket.org/qzhu/breast.cancer.tf/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Gal Dinstag ◽  
Ron Shamir

Abstract Motivation Evolution of cancer is driven by few somatic mutations that disrupt cellular processes, causing abnormal proliferation and tumor development, while most somatic mutations have no impact on progression. Distinguishing those mutated genes that drive tumorigenesis in a patient is a primary goal in cancer therapy: Knowledge of these genes and the pathways on which they operate can illuminate disease mechanisms and indicate potential therapies and drug targets. Current research focuses mainly on cohort-level driver gene identification, but patient-specific driver gene identification remains a challenge. Methods We developed a new algorithm for patient-specific ranking of driver genes. The algorithm, called PRODIGY, analyzes the expression and mutation profiles of the patient along with data on known pathways and protein-protein interactions. Prodigy quantifies the impact of each mutated gene on every deregulated pathway using the prize collecting Steiner tree model. Mutated genes are ranked by their aggregated impact on all deregulated pathways. Results In testing on five TCGA cancer cohorts spanning >2500 patients and comparison to validated driver genes, Prodigy outperformed extant methods and ranking based on network centrality measures. Our results pinpoint the pleiotropic effect of driver genes and show that Prodigy is capable of identifying even very rare drivers. Hence, Prodigy takes a step further towards personalized medicine and treatment. Availability The Prodigy R package is available at: https://github.com/Shamir-Lab/PRODIGY. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 20 (6) ◽  
pp. 2306-2315 ◽  
Author(s):  
Meng Zou ◽  
Rui Jin ◽  
Kin Fai Au

Abstract The intra-tumor heterogeneity is associated with cancer progression and therapeutic resistance, such as in breast cancer. While the existing methods for studying tumor heterogeneity only analyze variant allele frequency (VAF), the genotype of variant is also informative for inferring subclones, which can be detected by long reads or paired-end reads. We developed GenoClone to integrate VAF with the genotype of variant innovatively, so it showed superior performance of inferring the number of subclones, estimating the fractions of subclones and identifying somatic single-nucleotide variants composition of subclones. When GenoClone was applied to 389 TCGA breast cancer samples, it revealed extensive intra-tumor heterogeneity. We further found that a few somatic mutations were relevant to the late stage of tumor evolution, including the ones at the oncogene PIK3CA and the tumor suppress gene TP53. Moreover, 52 subclones that were identified from 167 samples shared high similarity of somatic mutations, which were clustered into three groups with the sizes of 24, 14 and 14. It is helpful for understanding the development of breast cancer in certain subgroups of people and the drug development for population level. Furthermore, GenoClone also identified the tumor heterogeneity in different aliquots of the same samples. The implementation of GenoClone is available at http://www.healthcare.uiowa.edu/labs/au/GenoClone/.


2021 ◽  
Author(s):  
Forough Firoozbakht ◽  
Iman Rezaeian ◽  
Luis Rueda ◽  
Alioune Ngom

Abstract 'De novo' drug discovery is costly, slow, and with high risk. Repurposing known drugs for treatment of other diseases offers a fast, low-cost/risk and highly-efficient method toward development of efficacious treatments. The emergence of large-scale heterogeneous biomolecular networks, molecular, chemical and bioactivity data, and genomic and phenotypic data of pharmacological compounds is enabling the development of new area of drug repurposing called 'in silico' drug repurposing, i.e., computational drug repurposing (CDR). The aim of CDR is to discover new indications for an existing drug (drug-centric) or to identify effective drugs for a disease (disease-centric). Both drug-centric and disease-centric approaches have the common challenge of either assessing the similarity or connections between drugs and diseases. However, traditional CDR is fraught with many challenges due to the underlying complex pharmacology and biology of diseases, genes, and drugs, as well as the complexity of their associations. As such, capturing highly non-linear associations among drugs, genes, diseases by most existing CDR methods has been challenging.We propose a network-based integration approach that can best capture knowledge (and complex relationships) contained within and between drugs, genes and disease data. A network-based machine learning approach is applied thereafter by using the extracted knowledge and relationships in order to identify single and pair of approved or experimental drugs with potential therapeutic effects on different breast cancer subtypes.


2018 ◽  
Author(s):  
Jeramiah J. Smith ◽  
Nataliya Timoshevskaya ◽  
Vladimir A. Timoshevskiy ◽  
Melissa C. Keinath ◽  
Drew Hardy ◽  
...  

ABSTRACTThe axolotl (Ambystoma mexicanum) provides critical models for studying regeneration, evolution and development. However, its large genome (~32 gigabases) presents a formidable barrier to genetic analyses. Recent efforts have yielded genome assemblies consisting of thousands of unordered scaffolds that resolve gene structures, but do not yet permit large scale analyses of genome structure and function. We adapted an established mapping approach to leverage dense SNP typing information and for the first time assemble the axolotl genome into 14 chromosomes. Moreover, we used fluorescence in situ hybridization to verify the structure of these 14 scaffolds and assign each to its corresponding physical chromosome. This new assembly covers 27.3 gigabases and encompasses 94% of annotated gene models on chromosomal scaffolds. We show the assembly’s utility by resolving genome-wide orthologies between the axolotl and other vertebrates, identifying the footprints of historical introgression events that occurred during the development of axolotl genetic stocks, and precisely mapping several phenotypes including a large deletion underlying the cardiac mutant. This chromosome-scale assembly will greatly facilitate studies of the axolotl in biological research.


Sign in / Sign up

Export Citation Format

Share Document