scholarly journals Leveraging protein dynamics to identify cancer mutational hotspots using 3D structures

2019 ◽  
Vol 116 (38) ◽  
pp. 18962-18970 ◽  
Author(s):  
Sushant Kumar ◽  
Declan Clarke ◽  
Mark B. Gerstein

Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue–residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.

2018 ◽  
Author(s):  
Sushant Kumar ◽  
Declan Clarke ◽  
Mark B. Gerstein

AbstractLarge-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence and clustering-based approaches. Some of these methods also employ three-dimensional protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite the essential role of dynamics in protein functionality. In this work, we present a framework to identify driver genes using a dynamics-based search of mutational hotspot communities. After partitioning 3D structures into distinct communities of residues using anisotropic network models, we map variants onto the partitioned structures. We then search for signals of positive selection among these residue communities to identify putative drivers. We applied our method using the TCGA pan-cancer atlas missense mutation catalog. Overall, our analyses predict one or more mutational hotspots within the resolved structures of 434 genes. Ontological and pathway enrichment analyses implicate genes with predicted hotspots to be enriched in biological processes associated with tumor progression. Additionally, a comparison between our approach and existing hotspot detection methods that use structural data suggests that the inclusion of dynamics significantly increases the sensitivity of driver detection.


Author(s):  
Caitlyn L. McCafferty ◽  
Edward M. Marcotte ◽  
David W. Taylor

ABSTRACTProtein-protein interactions are critical to protein function, but three-dimensional (3D) arrangements of interacting proteins have proven hard to predict, even given the identities and 3D structures of the interacting partners. Specifically, identifying the relevant pairwise interaction surfaces remains difficult, often relying on shape complementarity with molecular docking while accounting for molecular motions to optimize rigid 3D translations and rotations. However, such approaches can be computationally expensive, and faster, less accurate approximations may prove useful for large-scale prediction and assembly of 3D structures of multi-protein complexes. We asked if a reduced representation of protein geometry retains enough information about molecular properties to predict pairwise protein interaction interfaces that are tolerant of limited structural rearrangements. Here, we describe a cuboid transformation of 3D protein accessible surfaces on which molecular properties such as charge, hydrophobicity, and mutation rate can be easily mapped, implemented in the MorphProt package. Pairs of surfaces are compared to rapidly assess partner-specific potential surface complementarity. On two available benchmarks of 85 overall known protein complexes, we observed F1 scores (a weighted combination of precision and recall) of 19-34% at correctly identifying protein interaction surfaces, comparable to more computationally intensive 3D docking methods in the annual Critical Assessment of PRedicted Interactions. Furthermore, we examined the effect of molecular motion through normal mode simulation on a benchmark receptor-ligand pair and observed no marked loss of predictive accuracy for distortions of up to 6 Å RMSD. Thus, a cuboid transformation of protein surfaces retains considerable information about surface complementarity, offers enhanced speed of comparison relative to more complex geometric representations, and exhibits tolerance to conformational changes.


2020 ◽  
Author(s):  
João Henriques ◽  
Kresten Lindorff-Larsen

AbstractProteins carry out a wide range of functions that are tightly regulated in space and time. Protein phosphorylation is the most common post-translation modification of proteins and plays key roles in the regulation of many biological processes. The finding that many phosphorylated residues are not solvent exposed in the unphosphorylated state opens several questions for understanding the mechanism that underlies phosphorylation and how phosphorylation may affect protein structures. First, since kinases need access to the phosphorylated residue, how do such buried residues become modified? Second, once phosphorylated, what are the structural effects of phosphorylation of buried residues and do they lead to changed conformational dynamics. We have used the ternary complex between p27, Cdk2 and Cyclin A to study these questions using enhanced sampling molecular dynamics simulations. In line with previous NMR and single-molecule fluorescence experiments we observe transient exposure of Tyr88 in p27, even in its unphosphorylated state. Once Tyr88 is phosphorylated, we observe a coupling to a second site, thus making Tyr74 more easily exposed, and thereby the target for a second phosphorylation step. Our observations provide atomic details on how protein dynamics plays a role in modulating multi-site phosphorylation in p27, thus supplementing previous experimental observations. More generally, we discuss how the observed phenomenon of transient exposure of buried residues may play a more general role in regulating protein function.Significance StatementProtein phosphorylation is a common post-translation modification and is carried out by kinases. While many phosphorylation sites are located in disordered regions of proteins or in loops, a surprisingly large number of modification sites are buried inside folded domains. This observation led us to ask the question of how kinases gain access to such buried residues. We used the complex between p27, a regulator of cell cycle progression, and Cyclin-dependent kinase 2/Cyclin A to study this problem. We hypothesized that transient exposure of buried tyrosines in p27 to the solvent would make them accessible to kinases, explaining how buried residues get modified. We provide an atomic-level description of these dynamic processes revealing how protein dynamics plays a role in regulation.


2019 ◽  
Author(s):  
Amy Li ◽  
Bjoern Chapuy ◽  
Xaralabos Varelas ◽  
Paola Sebastiani ◽  
Stefano Monti

AbstractThe emergence of large-scale multi-omics data warrants method development for data integration. Genomic studies from cancer patients have identified epigenetic and genetic regulators – such as methylation marks, somatic mutations, and somatic copy number alterations (SCNAs), among others – as predictive features of cancer outcome. However, identification of “driver genes” associated with a given alteration remains a challenge. To this end, we developed a computational tool, iEDGE, to model cis and trans effects of (epi-)DNA alterations and identify potential cis driver genes, where cis and trans genes denote those genes falling within and outside the genomic boundaries of a given (epi-)genetic alteration, respectively.First, iEDGE identifies the cis and trans genes associated with the presence/absence of a particular epi-DNA alteration across samples. Tests of statistical mediation are then performed to determine the cis genes predictive of the trans gene expression. Finally, cis and trans effects are annotated by pathway enrichment analysis to gain insights into the underlying regulatory networks.We used iEDGE to perform integrative analysis of SCNAs and gene expression data from breast cancer and 18 additional cancer types included in The Cancer Genome Atlas (TCGA). Notably, cis gene drivers identified by iEDGE were found to be significantly enriched for known driver genes from multiple compendia of validated oncogenes and tumor suppressors, suggesting that the remainder are of equal importance. Furthermore, predicted drivers were enriched for functionally relevant cancer genes with amplification-driven dependencies, which are of potential prognostic and therapeutic value. All the analyses results are accessible athttps://montilab.bu.edu/iEDGE.


2021 ◽  
Author(s):  
Sandeep Kaur ◽  
Neblina Sikta ◽  
Andrea Schafferhans ◽  
Nicola Bordin ◽  
Mark J. Cowley ◽  
...  

AbstractMotivationVariant analysis is a core task in bioinformatics that requires integrating data from many sources. This process can be helped by using 3D structures of proteins, which can provide a spatial context that can provide insight into how variants affect function. Many available tools can help with mapping variants onto structures; but each has specific restrictions, with the result that many researchers fail to benefit from valuable insights that could be gained from structural data.ResultsTo address this, we have created a streamlined system for incorporating 3D structures into variant analysis. Variants can be easily specified via URLs that are easily readable and writable, and use the notation recommended by the Human Genome Variation Society (HGVS). For example, ‘https://aquaria.app/SARS-CoV-2/S/?N501Y’ specifies the N501Y variant of SARS-CoV-2 S protein. In addition to mapping variants onto structures, our system provides summary information from multiple external resources, including COSMIC, CATH-FunVar, and PredictProtein. Furthermore, our system identifies and summarizes structures containing the variant, as well as the variant-position. Our system supports essentially any mutation for any well-studied protein, and uses all available structural data — including models inferred via very remote homology — integrated into a system that is fast and simple to use. By giving researchers easy, streamlined access to a wealth of structural information during variant analysis, our system will help in revealing novel insights into the molecular mechanisms underlying protein function in health and disease.AvailabilityOur resource is freely available at the project home page (https://aquaria.app). After peer review, the code will be openly available via a GPL version 2 license at https://github.com/ODonoghueLab/Aquaria. PSSH2, the database of sequence-to-structure alignments, is also freely available for download at https://zenodo.org/record/[email protected] informationNone.


2019 ◽  
Author(s):  
Martin Simonovsky ◽  
Joshua Meyers

AbstractMotivationProtein binding site comparison (pocket matching) is of importance in drug discovery. Identification of similar binding sites can help guide efforts for hit finding, understanding polypharmacology and characterization of protein function. The design of pocket matching methods has traditionally involved much intuition, and has employed a broad variety of algorithms and representations of the input protein structures. We regard the high heterogeneity of past work and the recent availability of large-scale benchmarks as an indicator that a data-driven approach may provide a new perspective.ResultsWe propose DeeplyTough, a convolutional neural network that encodes a three-dimensional representation of protein binding sites into descriptor vectors that may be compared efficiently in an alignment-free manner by computing pairwise Euclidean distances. The network is trained with supervision: (i) to provide similar pockets with similar descriptors, (ii) to separate the descriptors of dissimilar pockets by a minimum margin, and (iii) to achieve robustness to nuisance variations. We evaluate our method using three large-scale benchmark datasets, on which it demonstrates excellent performance for held-out data coming from the training distribution and competitive performance when the trained network is required to generalize to datasets constructed independently.Availabilityhttps://github.com/BenevolentAI/[email protected],[email protected]


2018 ◽  
Author(s):  
Collin Tokheim ◽  
Rachel Karchin

SummaryLarge-scale cancer sequencing studies of patient cohorts have statistically implicated many genes driving cancer growth and progression, and their identification has yielded substantial translational impact. However, a remaining challenge is to increase the resolution of driver prediction from the gene level to the mutation level, because mutation-level predictions are more closely aligned with the goal of precision cancer medicine. Here we present CHASMplus, a computational method, that is uniquely capable of identifying driver missense mutations, including those specific to a cancer type, as evidenced by significantly superior performance on diverse benchmarks. Applied to 8,657 tumor samples across 32 cancer types in The Cancer Genome Atlas, CHASMplus identifies over 4,000 unique driver missense mutations in 240 genes, supporting a prominent role for rare driver mutations. We show which TCGA cancer types are likely to yield discovery of new driver missense mutations by additional sequencing, which has important implications for public policy.SignificanceMissense mutations are the most frequent mutation type in cancers and the most difficult to interpret. While many computational methods have been developed to predict whether genes are cancer drivers or whether missense mutations are generally deleterious or pathogenic, there has not previously been a method to score the oncogenic impact of a missense mutation specifically by cancer type, limiting adoption of computational missense mutation predictors in the clinic. Cancer patients are routinely sequenced with targeted panels of cancer driver genes, but such genes contain a mixture of driver and passenger missense mutations which differ by cancer type. A patient’s therapeutic response to drugs and optimal assignment to a clinical trial depends on both the specific mutation in the gene of interest and cancer type. We present a new machine learning method honed for each TCGA cancer type, and a resource for fast lookup of the cancer-specific driver propensity of every possible missense mutation in the human exome.


2007 ◽  
Vol 36 (Supplement_1) ◽  
pp. D884-D891 ◽  
Author(s):  
C. Zhang ◽  
O. Crasta ◽  
S. Cammer ◽  
R. Will ◽  
R. Kenyon ◽  
...  

Abstract The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host–pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/.


2020 ◽  
Vol 21 ◽  
Author(s):  
Yin-xue Wang ◽  
Yi-xiang Wang ◽  
Yi-ke Li ◽  
Shi-yan Tu ◽  
Yi-qing Wang

: Ovarian cancer (OC) is one of the deadliest gynecological malignancy. Epithelial ovarian cancer (EOC) is its most common form. OC has both a poor prognosis and a high mortality rate due to the difficulties of early diagnosis, the limitation of current treatment and resistance to chemotherapy. Extracellular vesicles is a heterogeneous group of cellderived submicron vesicles which can be detected in body fluids, and it can be classified into three main types including exosomes, micro-vesicles, and apoptotic bodies. Cancer cells can produce more EVs than healthy cells. Moreover, the contents of these EVs have been found distinct from each other. It has been considered that EVs shedding from tumor cells may be implicated in clinical applications. Such as a tool for tumor diagnosis, prognosis and potential treatment of certain cancers. In this review, we provide a brief description of EVs in diagnosis, prognosis, treatment, drug-resistant of OC. Cancer-related EVs show powerful influences on tumors by various biological mechanisms. However, the contents mentioned above remain in the laboratory stage and there is a lack of large-scale clinical trials, and the maturity of the purification and detection methods is a constraint. In addition, amplification of oncogenes on ecDNA is remarkably prevalent in cancer, it may be possible that ecDNA can be encapsulated in EVs and thus detected by us. In summary, much more research on EVs needs to be perform to reveal breakthroughs in OC and to accelerate the process of its application on clinic.


2021 ◽  
Author(s):  
Marion Germain ◽  
Daniel Kneeshaw ◽  
Louis De Grandpré ◽  
Mélanie Desrochers ◽  
Patrick M. A. James ◽  
...  

Abstract Context Although the spatiotemporal dynamics of spruce budworm outbreaks have been intensively studied, forecasting outbreaks remains challenging. During outbreaks, budworm-linked warblers (Tennessee, Cape May, and bay-breasted warbler) show a strong positive response to increases in spruce budworm, but little is known about the relative timing of these responses. Objectives We hypothesized that these warblers could be used as sentinels of future defoliation of budworm host trees. We examined the timing and magnitude of the relationships between defoliation by spruce budworm and changes in the probability of presence of warblers to determine whether they responded to budworm infestation before local defoliation being observed by standard detection methods. Methods We modelled this relationship using large-scale point count surveys of songbirds and maps of cumulative time-lagged defoliation over multiple spatial scales (2–30 km radius around sampling points) in Quebec, Canada. Results All three warbler species responded positively to defoliation at each spatial scale considered, but the timing of their response differed. Maximum probability of presence of Tennessee and Cape May warbler coincided with observations of local defoliation, or provided a one year warning, making them of little use to guide early interventions. In contrast, the probability of presence of bay-breasted warbler consistently increased 3–4 years before defoliation was detectable. Conclusions Early detection is a critical step in the management of spruce budworm outbreaks and rapid increases in the probability of presence of bay-breasted warbler could be used to identify future epicenters and target ground-based local sampling of spruce budworm.


Sign in / Sign up

Export Citation Format

Share Document