Python in proteomics

10.7287/peerj.preprints.27736v1 ◽

2019 ◽

Author(s):

Hannes L Rost

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Open Source ◽

Biological Sequence ◽

Post Translational Modification ◽

Development Environment ◽

Spectrometric Data ◽

Graphical Environment ◽

High Flexibility ◽

Prototype Software

Python is a versatile scripting language that is widely used in industry and academia. In bioinformatics, there are multiple packages supporting data analysis with Python that range from biological sequence analysis with Biopython to structural modeling and visualization with packages like PyMOL and PyRosetta, to numerical computation and advanced plotting with NumPy/SciPy. In the proteomics community, Python began to be widely used around 2012 when several mature Python packages were published including pymzML, Pyteomics and pyOpenMS. This has led to an ever-increasing interest in the Python programming language in the proteomics and mass spectrometry community. The number of publications referencing or using Python has risen eight fold since 2012 (compared with the same time period before 2012), with multiple open-source Python packages now supporting mass spectrometric data analysis and processing. Computing and data analysis in mass spectrometry is very diverse and in many cases must be tailored to a specific experiment. Often, multiple analysis steps have to be performed (identification, quantification, post-translational modification analysis, filtering, FDR analysis etc.) in an analysis pipeline, which requires high flexibility in the analysis. This is where Python truly shines, due to its flexibility, visualization capabilities and the ability to extend computation with a large number of powerful libraries. Python can be used to quickly prototype software, combine existing libraries into powerful analysis workflows while avoiding the trap of re- inventing the wheel for a new project. Here, we will describe data analysis with Python using the pyOpenMS package. An extended documentation and tutorial can also be found online at https://pyopenms.readthedocs.io. To allow the reader to follow all steps in the tutorial, we will also describe the installation process of the software. Our installation is based on Anaconda, an open- source Python distribution that includes the Spyder integrated development environment (IDE) that allows development with pyOpenMS in a graphical environment.

Download Full-text

Discrimination of GalNAc (4S/6S) sulfation sites in chondroitin sulfate disaccharides by chip-based nanoelectrospray multistage mass spectrometry

Open Chemistry ◽

10.2478/s11532-009-0070-7 ◽

2009 ◽

Vol 7 (4) ◽

pp. 752-759 ◽

Cited By ~ 10

Author(s):

Corina Flangea ◽

Alina Serb ◽

Catalin Schiopu ◽

Sorin Tudor ◽

Eugen Sisu ◽

...

Keyword(s):

Mass Spectrometry ◽

Chondroitin Sulfate ◽

Ion Trap ◽

High Capacity ◽

Ester Group ◽

Negative Ion ◽

Post Translational Modification ◽

Spectrometric Data ◽

Group Position ◽

Fragmentation Patterns

AbstractSulfation pattern within chondroitin sulfate (CS) glycosaminoglycan (GAG) chains is an important post-translational modification that regulates their interaction with proteins. In this context, development of highly efficient and reproducible analytical methods for the investigation of CS sulfation patterns is of high necessity. In this study we report a novel method for straightforward determination of N-acetylgalactosamine (GalNAc) sulfation sites in chondroitin sulfate disaccharides. Our protocol involves combining fully automated chip-based nanoelectrospray (nanoESI) for analyte infusion and ionization in negative ion mode with multistage (MSn) collision-induced dissociation (CID) high capacity ion trap (HCT) mass spectrometry for generation of sequence ions diagnostic for identification of sulfate ester group position within GalNAc residues. The feasibility of this approach is here demonstrated on chondroitin 6-O-sulfate and chondroitin 4-O-sulfate disaccharides. Fragmentation patterns obtained by MS2 and MS3 sequencing stages provided first mass spectrometric data from which sulfation site(s) within GalNAc monosaccharide ring could be unequivocally deciphered. Hence, the method allowed discriminating 4S/6S sulfation sites solely on the basis of MS and multistage MS evidence.

Download Full-text

OpenMS: a flexible open-source software platform for mass spectrometry data analysis

Nature Methods ◽

10.1038/nmeth.3959 ◽

2016 ◽

Vol 13 (9) ◽

pp. 741-748 ◽

Cited By ~ 222

Author(s):

Hannes L Röst ◽

Timo Sachsenberg ◽

Stephan Aiche ◽

Chris Bielow ◽

Hendrik Weisser ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Open Source ◽

Open Source Software ◽

Mass Spectrometry Data ◽

Software Platform

Download Full-text

SIESTA as a universal unbiased proteomics approach for identification and prioritization of enzyme substrates

10.21203/rs.3.pex-1327/v1 ◽

2021 ◽

Author(s):

Amir Ata Saei ◽

Christian M. Beusch ◽

Pierre Sabatier ◽

Juan Astorga Wells ◽

Hassan Gharibi ◽

...

Keyword(s):

Mass Spectrometry ◽

Thermal Stability ◽

Thermal Analysis ◽

Data Analysis ◽

Experimental Design ◽

Sample Preparation ◽

Sample Analysis ◽

Post Translational Modification ◽

Enzyme Substrates ◽

Proteomics Approach

Abstract This protocol describes the proteomics technique called System-wide Identification and prioritization of Enzyme Substrates by Thermal Analysis or SIESTA 1,2. SIESTA can be used for universal discovery of enzyme substrates that shift in thermal stability or solubility upon post-translational modification (PTM). Experimental design, proteomics sample preparation and data analysis are the key stages of this protocol. Data analysis can be performed using our SIESTA package hosted on GitHub 3. When performed with classical thermal proteome profiling (TPP), the protocol will take 5 days for sample preparation and 14 days of sample analysis by mass spectrometry (the current protocol). If our high-throughput version of TPP called Proteome Integral Solubility Alteration assay (PISA) 4 is used instead, the sample analysis time by mass spectrometry is reduced to 1-2 days for the same number of conditions.

Download Full-text

A Comprehensive, Open-source Platform for Mass Spectrometry-based Glycoproteomics Data Analysis

Molecular & Cellular Proteomics ◽

10.1074/mcp.m117.068239 ◽

2017 ◽

Vol 16 (11) ◽

pp. 2032-2047 ◽

Cited By ~ 18

Author(s):

Gang Liu ◽

Kai Cheng ◽

Chi Y. Lo ◽

Jun Li ◽

Jun Qu ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Open Source

Download Full-text

Free Open Source Software for Protein and Peptide Mass Spectrometry-based Science

Current Protein and Peptide Science ◽

10.2174/1389203722666210118160946 ◽

2021 ◽

Vol 22 ◽

Author(s):

Filippo Rusconi

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Open Source ◽

Open Source Software ◽

Interactive Graphics ◽

Standard Format ◽

Mass Data ◽

Peptide Mass ◽

Peptide Mass Spectrometry ◽

Proteins And Peptides

: In the field of biology, and specifically in protein and peptide science, the power of mass spectrometry is that it is applicable to a vast spectrum of applications. Mass spectrometry can be applied to identify proteins and peptides in complex mixtures, to identify and locate post-translational modifications, to characterize the structure of proteins and peptides to the most detailed level or to detect protein–ligand non-covalent interactions. Thanks to the Free and Open Source Software (FOSS) movement, scientists have limitless opportunities to deepen their skills in software development to code software that solves mass spectrometric data analysis problems. After conversion of raw data files to open standard format files, the entire spectrum of data analysis tasks can now be performed integrally on FOSS platforms, like GNU/Linux, and only with FOSS solutions. This review presents a brief history of mass spectrometry open file formats and goes on with the description of FOSS projects that are commonly used in protein and peptide mass spectrometry fields of endeavor: identification projects that involve mostly automated pipelines, like proteomics and peptidomics, and bio-structural characterization projects that most often involve manual scrutiny of the mass data. Projects of the last kind usually involve software that allows the user to delve into the mass data in an interactive graphics-oriented manner. Software projects were thus categorized on the basis of these criteria: software libraries for software developers vs desktop-based graphical user interface, software for the end user and automated pipeline-based data processing vs interactive graphics-based mass data scrutiny.

Download Full-text

mMass data miner: an open source alternative for mass spectrometric data analysis

Rapid Communications in Mass Spectrometry ◽

10.1002/rcm.3444 ◽

2008 ◽

Vol 22 (6) ◽

pp. 905-908 ◽

Cited By ~ 282

Author(s):

Martin Strohalm ◽

Martin Hassman ◽

Bedřich Košata ◽

Milan Kodíček

Keyword(s):

Data Analysis ◽

Open Source ◽

Mass Spectrometric ◽

Mass Spectrometric Data ◽

Spectrometric Data

Download Full-text

Dicarbonyl derived post-translational modifications: chemistry bridging biology and aging-related disease

Essays in Biochemistry ◽

10.1042/ebc20190057 ◽

2020 ◽

Vol 64 (1) ◽

pp. 97-110

Author(s):

Christian Sibbersen ◽

Mogens Johannsen

Keyword(s):

Mass Spectrometry ◽

Future Research ◽

Amino Acid Residues ◽

Post Translational Modification ◽

Dicarbonyl Compounds ◽

Protein Targets ◽

Post Translational Modifications ◽

Reactive Protein ◽

Glycation End Products ◽

Direct Mass Spectrometry

Abstract In living systems, nucleophilic amino acid residues are prone to non-enzymatic post-translational modification by electrophiles. α-Dicarbonyl compounds are a special type of electrophiles that can react irreversibly with lysine, arginine, and cysteine residues via complex mechanisms to form post-translational modifications known as advanced glycation end-products (AGEs). Glyoxal, methylglyoxal, and 3-deoxyglucosone are the major endogenous dicarbonyls, with methylglyoxal being the most well-studied. There are several routes that lead to the formation of dicarbonyl compounds, most originating from glucose and glucose metabolism, such as the non-enzymatic decomposition of glycolytic intermediates and fructosyl amines. Although dicarbonyls are removed continuously mainly via the glyoxalase system, several conditions lead to an increase in dicarbonyl concentration and thereby AGE formation. AGEs have been implicated in diabetes and aging-related diseases, and for this reason the elucidation of their structure as well as protein targets is of great interest. Though the dicarbonyls and reactive protein side chains are of relatively simple nature, the structures of the adducts as well as their mechanism of formation are not that trivial. Furthermore, detection of sites of modification can be demanding and current best practices rely on either direct mass spectrometry or various methods of enrichment based on antibodies or click chemistry followed by mass spectrometry. Future research into the structure of these adducts and protein targets of dicarbonyl compounds may improve the understanding of how the mechanisms of diabetes and aging-related physiological damage occur.

Download Full-text

Anacyclamide D8P, a prenylated cyanobactin from a Sphaerospermopsis sp. cyanobacterium

10.26434/chemrxiv.5346739.v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Joana Martins ◽

Niina Leikoski ◽

Matti Wahlsten ◽

Joana Azevedo ◽

Jorge Antunes ◽

...

Keyword(s):

Gene Cluster ◽

Cyclic Peptides ◽

Biosynthetic Gene Cluster ◽

The Novel ◽

Tyrosine Residue ◽

Biosynthetic Gene ◽

Post Translational Modification ◽

Biological Assays ◽

Mass Spectrometric Data ◽

Spectrometric Data

Cyanobactins are a family of linear and cyclic peptides produced through the post-translational modification of short precursor peptides. Anacyclamides are macrocyclic cyanobactins with a highly diverse sequence that are common in the genus Anabaena. A mass spectrometry-based screening of potential cyanobactin producers led to the discovery of a new prenylated member of this family of compounds, anacyclamide D8P (1), from Sphaerospermopsis sp. LEGE 00249. The anacyclamide biosynthetic gene cluster (acy) encoding the novel macrocyclic prenylated cyanobactin, was sequenced. Heterologous expression of the acy gene cluster in Escherichia coli established the connection between genomic and mass spectrometric data. Unambiguous establishment of the type and site of prenylation required the full structural elucidation of 1 using Nuclear Magnetic Resonance (NMR), which demonstrated that a forward prenylation occurred on the tyrosine residue. Compound 1 was tested in pharmacologically or ecologically relevant biological assays and revealed moderate antimicrobial activity towards the fouling bacterium Halomonas aquamarina CECT 5000.

Download Full-text

Towards Computational Models of Identifying Protein Ubiquitination Sites

Current Drug Targets ◽

10.2174/1389450119666180924150202 ◽

2019 ◽

Vol 20 (5) ◽

pp. 565-578 ◽

Cited By ~ 1

Author(s):

Lidong Wang ◽

Ruijun Zhang

Keyword(s):

Computational Methods ◽

Computational Models ◽

Feature Representation ◽

Biological Sequence ◽

Post Translational Modification ◽

Test Dataset ◽

Protein Ubiquitination ◽

Protein Functions ◽

Independent Test Dataset ◽

Benchmark Datasets

Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.

Download Full-text