RoDiCE: robust differential protein co-expression analysis for cancer complexome

Author(s):  
Yusuke Matsui ◽  
Yuichi Abe ◽  
Kohei Uno ◽  
Satoru Miyano

Abstract Motivation The full spectrum of abnormalities in cancer-associated protein complexes remains largely unknown. Comparing the co-expression structure of each protein complex between tumor and healthy cells may provide insights regarding cancer-specific protein dysfunction. However, the technical limitations of mass spectrometry-based proteomics, including contamination with biological protein variants, causes noise that leads to non-negligible over- (or under-) estimating co-expression. Results We propose a robust algorithm for identifying protein complex aberrations in cancer based on differential protein co-expression testing. Our method based on a copula is sufficient for improving identification accuracy with noisy data compared to conventional linear correlation-based approaches. As an application, we use large-scale proteomic data from renal cancer to show that important protein complexes, regulatory signaling pathways and drug targets can be identified. The proposed approach surpasses traditional linear correlations to provide insights into higher-order differential co-expression structures. Availability and implementation https://github.com/ymatts/RoDiCE. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Yusuke Matsui ◽  
Yuichi Abe ◽  
Kohei Uno ◽  
Satoru Miyano

AbstractMotivationThe full picture of abnormalities in protein complexes in cancer remains largely unknown. Comparing the co-expression structure of each protein complex between tumor and normal groups could help us understand the cancer-specific dysfunction of proteins. However, the technical limitations of mass spectrometry-based proteomics and biological variations contaminating the protein expression with noise lead to non-negligible over- (or under-) estimating co-expression.ResultsWe propose a robust algorithm for identifying protein complex aberrations in cancer based on differential protein co-expression testing. Our method based on a copula is sufficient for improving the identification accuracy with noisy data over a conventional linear correlation-based approach. As an application, we show that important protein complexes can be identified along with regulatory signaling pathways, and even drug targets can be identified using large-scale proteomics data from renal cancer. The proposed approach goes beyond traditional linear correlations to provide insights into higher order differential co-expression structures.Availability and Implementationhttps://github.com/ymatts/[email protected]


2019 ◽  
Author(s):  
Wojciech Michalak ◽  
Vasileios Tsiamis ◽  
Veit Schwämmle ◽  
Adelina Rogowska-Wrzesińska

AbstractWe have developed ComplexBrowser, an open source, online platform for supervised analysis of quantitative proteomics data that focuses on protein complexes. The software uses information from CORUM and Complex Portal databases to identify protein complex components. Based on the expression changes of individual complex subunits across the proteomics experiment it calculates Complex Fold Change (CFC) factor that characterises the overall protein complex expression trend and the level of subunit co-regulation. Thus up- and down-regulated complexes can be identified. It provides interactive visualisation of protein complexes composition and expression for exploratory analysis. It also incorporates a quality control step that includes normalisation and statistical analysis based on Limma test. ComplexBrowser performance was tested on two previously published proteomics studies identifying changes in protein expression in human adenocarcinoma tissue and during activation of mouse T-cells. The analysis revealed 1519 and 332 protein complexes, of which 233 and 41 were found co-ordinately regulated in the respective studies. The adopted approach provided evidence for a shift to glucose-based metabolism and high proliferation in adenocarcinoma tissues and identification of chromatin remodelling complexes involved in mouse T-cell activation. The results correlate with the original interpretation of the experiments and also provide novel biological details about protein complexes affected. ComplexBrowser is, to our knowledge, the first tool to automate quantitative protein complex analysis for high-throughput studies, providing insights into protein complex regulation within minutes of analysis.A fully functional demo version of ComplexBrowser v1.0 is available online via http://computproteomics.bmb.sdu.dk/Apps/ComplexBrowser/The source code can be downloaded from: https://bitbucket.org/michalakw/complexbrowserHighlightsAutomated analysis of protein complexes in proteomics experimentsQuantitative measure of the coordinated changes in protein complex componentsInteractive visualisations for exploratory analysis of proteomics resultsIn briefComplexBrowser is capable of identifying protein complexes in datasets obtained from large scale quantitative proteomics experiments. It provides, in the form of the CFC factor, a quantitative measure of the coordinated changes in complex components. This facilitates assessing the overall trends in the processes governed by the identified protein complexes providing a new and complementary way of interpreting proteomics experiments.


2017 ◽  
Author(s):  
Caroline Ross ◽  
Bilal Nizami ◽  
Michael Glenister ◽  
Olivier Sheik Amamuddy ◽  
Ali Rana Atilgan ◽  
...  

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 782 ◽  
Author(s):  
Virja Mehta ◽  
Laura Trinkle-Mulcahy

Protein-protein interactions (PPIs) underlie most, if not all, cellular functions. The comprehensive mapping of these complex networks of stable and transient associations thus remains a key goal, both for systems biology-based initiatives (where it can be combined with other ‘omics’ data to gain a better understanding of functional pathways and networks) and for focused biological studies. Despite the significant challenges of such an undertaking, major strides have been made over the past few years. They include improvements in the computation prediction of PPIs and the literature curation of low-throughput studies of specific protein complexes, but also an increase in the deposition of high-quality data from non-biased high-throughput experimental PPI mapping strategies into publicly available databases.


Author(s):  
Maik Pietzner ◽  
Eleanor Wheeler ◽  
Julia Carrasco-Zanini ◽  
Johannes Raffler ◽  
Nicola D. Kerrison ◽  
...  

ABSTRACTStrategies to develop therapeutics for SARS-CoV-2 infection may be informed by experimental identification of viral-host protein interactions in cellular assays and measurement of host response proteins in COVID-19 patients. Identification of genetic variants that influence the level or activity of these proteins in the host could enable rapid ‘in silico’ assessment in human genetic studies of their causal relevance as molecular targets for new or repurposed drugs to treat COVID-19. We integrated large-scale genomic and aptamer-based plasma proteomic data from 10,708 individuals to characterize the genetic architecture of 179 host proteins reported to interact with SARS-CoV-2 proteins or to participate in the host response to COVID-19. We identified 220 host DNA sequence variants acting in cis (MAF 0.01-49.9%) and explaining 0.3-70.9% of the variance of 97 of these proteins, including 45 with no previously known protein quantitative trait loci (pQTL) and 38 encoding current drug targets. Systematic characterization of pQTLs across the phenome identified protein-drug-disease links, evidence that putative viral interaction partners such as MARK3 affect immune response, and establish the first link between a recently reported variant for respiratory failure of COVID-19 patients at the ABO locus and hypercoagulation, i.e. maladaptive host response. Our results accelerate the evaluation and prioritization of new drug development programmes and repurposing of trials to prevent, treat or reduce adverse outcomes. Rapid sharing and dynamic and detailed interrogation of results is facilitated through an interactive webserver (https://omicscience.org/apps/covidpgwas/).


Biomolecules ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. 1056 ◽  
Author(s):  
Kalyani Dhusia ◽  
Zhaoqian Su ◽  
Yinghao Wu

The formation of functionally versatile protein complexes underlies almost every biological process. The estimation of how fast these complexes can be formed has broad implications for unravelling the mechanism of biomolecular recognition. This kinetic property is traditionally quantified by association rates, which can be measured through various experimental techniques. To complement these time-consuming and labor-intensive approaches, we developed a coarse-grained simulation approach to study the physical processes of protein–protein association. We systematically calibrated our simulation method against a large-scale benchmark set. By combining a physics-based force field with a statistically-derived potential in the simulation, we found that the association rates of more than 80% of protein complexes can be correctly predicted within one order of magnitude relative to their experimental measurements. We further showed that a mixture of force fields derived from complementary sources was able to describe the process of protein–protein association with mechanistic details. For instance, we show that association of a protein complex contains multiple steps in which proteins continuously search their local binding orientations and form non-native-like intermediates through repeated dissociation and re-association. Moreover, with an ensemble of loosely bound encounter complexes observed around their native conformation, we suggest that the transition states of protein–protein association could be highly diverse on the structural level. Our study also supports the idea in which the association of a protein complex is driven by a “funnel-like” energy landscape. In summary, these results shed light on our understanding of how protein–protein recognition is kinetically modulated, and our coarse-grained simulation approach can serve as a useful addition to the existing experimental approaches that measure protein–protein association rates.


2020 ◽  
Vol 20 (11) ◽  
Author(s):  
Julia Carrasco Zanini ◽  
Maik Pietzner ◽  
Claudia Langenberg

Abstract Purpose of the Review Proteins are the central layer of information transfer from genome to phenome and represent the largest class of drug targets. We review recent advances in high-throughput technologies that provide comprehensive, scalable profiling of the plasma proteome with the potential to improve prediction and mechanistic understanding of type 2 diabetes (T2D). Recent Findings Technological and analytical advancements have enabled identification of novel protein biomarkers and signatures that help to address challenges of existing approaches to predict and screen for T2D. Genetic studies have so far revealed putative causal roles for only few of the proteins that have been linked to T2D, but ongoing large-scale genetic studies of the plasma proteome will help to address this and increase our understanding of aetiological pathways and mechanisms leading to diabetes. Summary Studies of the human plasma proteome have started to elucidate its potential for T2D prediction and biomarker discovery. Future studies integrating genomic and proteomic data will provide opportunities to prioritise drug targets and identify pathways linking genetic predisposition to T2D development.


2021 ◽  
Vol 1 ◽  
Author(s):  
Gökçe Senger ◽  
Martin H. Schaefer

Protein assembly is a highly dynamic process and proteins can interact in different ways and stoichiometries within a complex. The importance of maintaining protein stoichiometry for complex function and avoiding aggregation of orphan subunits has been demonstrated. However, how exactly the organization of proteins into complexes constrains differential protein abundance in extreme cellular conditions like cancer, where a lot of protein abundance changes occur, has not been systematically investigated. To study this, we collected proteomic data made available by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) to quantify proteomic changes during carcinogenesis and systematically tested five interaction types in complexes to investigate which of these features impact on protein abundance correlation patterns in cancer. We found that higher than expected fraction of protein complex subunits does not show changes in their abundances compared to those in the normal samples. Furthermore, we found that the way proteins interact in complexes indeed constrains their co-abundance patterns. Our results highlight the role of the interactions between the proteins and the need of cancer cells to deal with aberrant changes in protein abundance.


2018 ◽  
Author(s):  
Bianca K Stöcker ◽  
Till Schäfer ◽  
Petra Mutzel ◽  
Johannes Köster ◽  
Nils Kriege ◽  
...  

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.


2021 ◽  
Author(s):  
Varun S. Sharma ◽  
Andrea Fossati ◽  
Rodolfo Ciuffa ◽  
Marija Buljan ◽  
Evan G. Williams ◽  
...  

SummaryIt is a general assumption of molecular biology that the ensemble of expressed molecules, their activities and interactions determine biological processes, cellular states and phenotypes. Quantitative abundance of transcripts, proteins and metabolites are now routinely measured with considerable depth via an array of “OMICS” technologies, and recently a number of methods have also been introduced for the parallel analysis of the abundance, subunit composition and cell state specific changes of protein complexes. In comparison to the measurement of the molecular entities in a cell, the determination of their function remains experimentally challenging and labor-intensive. This holds particularly true for determining the function of protein complexes, which constitute the core functional assemblies of the cell. Therefore, the tremendous progress in multi-layer molecular profiling has been slow to translate into increased functional understanding of biological processes, cellular states and phenotypes. In this study we describe PCfun, a computational framework for the systematic annotation of protein complex function using Gene Ontology (GO) terms. This work is built upon the use of word embedding— natural language text embedded into continuous vector space that preserves semantic relationships— generated from the machine reading of 1 million open access PubMed Central articles. PCfun leverages the embedding for rapid annotation of protein complex function by integrating two approaches: (1) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector, and (2) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing the statistical test for the enrichment of the top NN GO terms within the child terms of the predicted GO terms by RF models. Thus, PCfun amalgamates information learned from the gold-standard protein-complex database, CORUM, with the unbiased predictions obtained directly from the word embedding, thereby enabling PCfun to identify the potential functions of putative protein complexes. The documentation and examples of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.


Sign in / Sign up

Export Citation Format

Share Document