scholarly journals A proteomics sample metadata representation for multiomics integration, and big data analysis.

2021 ◽  
Author(s):  
Chengxin Dai ◽  
Anja Fullgrabe ◽  
Julianus Pfeuffer ◽  
Elizaveta Solovyeva ◽  
Jingwen Deng ◽  
...  

The amount of public proteomics data is increasing at an extraordinary rate. Hundreds of datasets are submitted each month to ProteomeXchange repositories, representing many types of proteomics studies, focusing on different aspects such as quantitative experiments, post-translational modifications, protein-protein interactions, or subcellular localization, among many others. For every proteomics dataset, two levels of data are captured: the dataset description, and the data files (encoded in different file formats). Whereas the dataset description and data file formats are supported by all ProteomeXchange partner repositories, there is no standardized format to properly describe the sample metadata and their relationship with the dataset files in a way that fully allows their understanding or re-analysis. It is left to the users choice whether to provide or not an ad hoc document containing this information. Therefore, in many cases, understanding the study design and data requires going back to the associated publication. This can be tedious and may be restricted in the case of non-open access publications. In many cases, this problem limits the generalization and reuse of public proteomics data. Here we present a standard representation for sample metadata tailored to proteomics datasets produced by the HUPO Proteomics Standards Initiative and supported by ProteomeXchange resources. We repurposed the existing data format MAGE-TAB used routinely in the transcriptomics field to represent and annotate proteomics datasets. MAGE-TAB-Proteomics defines a set of annotation rules that the datasets submitted to ProteomeXchange should follow, ranging from sample properties to data analysis protocols. We also introduce a crowdsourcing project that enabled the manual curation of over 200 public datasets using MAGE-TAB-Proteomics. In addition, we describe an ecosystem of tools and libraries that were developed to validate and submit sample metadata-related information to ProteomeXchange. We expect that these tools will improve the reproducibility of published results and facilitate the reanalysis and integration of public proteomics datasets.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chengxin Dai ◽  
Anja Füllgrabe ◽  
Julianus Pfeuffer ◽  
Elizaveta M. Solovyeva ◽  
Jingwen Deng ◽  
...  

AbstractThe amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


2018 ◽  
Author(s):  
Shengchao Liu ◽  
Moayad Alnammi ◽  
Spencer S. Ericksen ◽  
Andrew F. Voter ◽  
Gene E. Ananiev ◽  
...  

AbstractVirtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the dataset and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein-protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions from our selected random forest recover 37 of the 54 active compounds from a library of 22,434 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well in public datasets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest.


Proteomes ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 14 ◽  
Author(s):  
Emmalyn J. Dupree ◽  
Madhuri Jayathirtha ◽  
Hannah Yorkey ◽  
Marius Mihasan ◽  
Brindusa Alina Petre ◽  
...  

Proteomics is the field of study that includes the analysis of proteins, from either a basic science prospective or a clinical one. Proteins can be investigated for their abundance, variety of proteoforms due to post-translational modifications (PTMs), and their stable or transient protein–protein interactions. This can be especially beneficial in the clinical setting when studying proteins involved in different diseases and conditions. Here, we aim to describe a bottom-up proteomics workflow from sample preparation to data analysis, including all of its benefits and pitfalls. We also describe potential improvements in this type of proteomics workflow for the future.


2003 ◽  
Vol 4 (1) ◽  
pp. 16-19 ◽  
Author(s):  
Sandra Orchard ◽  
Paul Kersey ◽  
Henning Hermjakob ◽  
Rolf Apweiler

The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and verification. Initially the fields of protein–protein interactions (PPI) and mass spectroscopy have been targeted and the inaugural meeting of the PSI addressed the questions of data storage and exchange in both of these areas. The PPI group rapidly reached consensus as to the minimum requirements for a data exchange model; an XML draft is now being produced. The mass spectroscopy group have achieved major advances in the definition of a required data model and working groups are currently taking these discussions further. A further meeting is planned in January 2003 to advance both these projects.


Parasitology ◽  
2012 ◽  
Vol 139 (9) ◽  
pp. 1103-1118 ◽  
Author(s):  
J. M. WASTLING ◽  
S. D. ARMSTRONG ◽  
R. KRISHNA ◽  
D. XIA

SUMMARYSystems biology aims to integrate multiple biological data types such as genomics, transcriptomics and proteomics across different levels of structure and scale; it represents an emerging paradigm in the scientific process which challenges the reductionism that has dominated biomedical research for hundreds of years. Systems biology will nevertheless only be successful if the technologies on which it is based are able to deliver the required type and quality of data. In this review we discuss how well positioned is proteomics to deliver the data necessary to support meaningful systems modelling in parasite biology. We summarise the current state of identification proteomics in parasites, but argue that a new generation of quantitative proteomics data is now needed to underpin effective systems modelling. We discuss the challenges faced to acquire more complete knowledge of protein post-translational modifications, protein turnover and protein-protein interactions in parasites. Finally we highlight the central role of proteome-informatics in ensuring that proteomics data is readily accessible to the user-community and can be translated and integrated with other relevant data types.


2021 ◽  
Vol 8 ◽  
Author(s):  
Mariela González-Avendaño ◽  
Simón Zúñiga-Almonacid ◽  
Ian Silva ◽  
Boris Lavanderos ◽  
Felipe Robinson ◽  
...  

Mass spectrometry-based proteomics methods are widely used to identify and quantify protein complexes involved in diverse biological processes. Specifically, tandem mass spectrometry methods represent an accurate and sensitive strategy for identifying protein-protein interactions. However, most of these approaches provide only lists of peptide fragments associated with a target protein, without performing further analyses to discriminate physical or functional protein-protein interactions. Here, we present the PPI-MASS web server, which provides an interactive analytics platform to identify protein-protein interactions with pharmacological potential by filtering a large protein set according to different biological features. Starting from a list of proteins detected by MS-based methods, PPI-MASS integrates an automatized pipeline to obtain information of each protein from freely accessible databases. The collected data include protein sequence, functional and structural properties, associated pathologies and drugs, as well as location and expression in human tissues. Based on this information, users can manipulate different filters in the web platform to identify candidate proteins to establish physical contacts with a target protein. Thus, our server offers a simple but powerful tool to detect novel protein-protein interactions, avoiding tedious and time-consuming data postprocessing. To test the web server, we employed the interactome of the TRPM4 and TMPRSS11a proteins as a use case. From these data, protein-protein interactions were identified, which have been validated through biochemical and bioinformatic studies. Accordingly, our web platform provides a comprehensive and complementary tool for identifying protein-protein complexes assisting the future design of associated therapies.


2018 ◽  
Author(s):  
Li Chen ◽  
Bai Zhang ◽  
Michael Schnaubelt ◽  
Punit Shah ◽  
Paul Aiyetan ◽  
...  

ABSTRACTRapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and high-performance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/


2015 ◽  
Vol 112 (16) ◽  
pp. 5011-5016 ◽  
Author(s):  
Robert Opitz ◽  
Matthias Müller ◽  
Cédric Reuter ◽  
Matthias Barone ◽  
Arne Soicke ◽  
...  

Small-molecule competitors of protein–protein interactions are urgently needed for functional analysis of large-scale genomics and proteomics data. Particularly abundant, yet so far undruggable, targets include domains specialized in recognizing proline-rich segments, including Src-homology 3 (SH3), WW, GYF, and Drosophila enabled (Ena)/vasodilator-stimulated phosphoprotein (VASP) homology 1 (EVH1) domains. Here, we present a modular strategy to obtain an extendable toolkit of chemical fragments (ProMs) designed to replace pairs of conserved prolines in recognition motifs. As proof-of-principle, we developed a small, selective, peptidomimetic inhibitor of Ena/VASP EVH1 domain interactions. Highly invasive MDA MB 231 breast-cancer cells treated with this ligand showed displacement of VASP from focal adhesions, as well as from the front of lamellipodia, and strongly reduced cell invasion. General applicability of our strategy is illustrated by the design of an ErbB4-derived ligand containing two ProM-1 fragments, targeting the yes-associated protein 1 (YAP1)-WW domain with a fivefold higher affinity.


2003 ◽  
Vol 4 (2) ◽  
pp. 203-206 ◽  
Author(s):  
Sandra Orchard ◽  
Paul Kersey ◽  
Weimin Zhu ◽  
Luisa Montecchi-Palazzi ◽  
Henning Hermjakob ◽  
...  

The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and verification. Rapid progress has been made in the development of common standards for data exchange in the fields of both mass spectrometry and protein–protein interactions since the first PSI meeting [1]. Both hardware and software manufacturers have agreed to work to ensure that a proteomics-specific extension is created for the emerging ASTM mass spectrometry standard and the data model for a proteomics experiment has advanced significantly. The Protein–Protein Interactions (PPI) group expects to publish the Level 1 PSI data exchange format for protein–protein interactions by early summer this year, and discussion as to the additional content of Level 2 has been initiated.


Sign in / Sign up

Export Citation Format

Share Document