Peptide Identification Using Peptide Amino Acid Attribute Vectors

<p>De novo peptide sequencing algorithms have been developed for peptide identification in proteomics from tandem mass spectra (MS/MS), which can be used to identify and discover novel peptides and proteins that do not have a database available. Despite improvements in MS instrumentation and de novo sequencing methods, a significant number of CID MS/MS spectra still remain unassigned with the current algorithms, often leading to low confidence of peptide assignments to the spectra. Moreover, current algorithms often fail to construct the completely matched sequences, and produce partial matches. Therefore, identification of full-length peptides remains challenging. Another major challenge is the existence of noise in MS/MS spectra which makes the data highly imbalanced. Also missing peaks, caused by incomplete MS fragmentation makes it more difficult to infer a full-length peptide sequence. In addition, the large search space of all possible amino acid sequences for each spectrum leads to a high false discovery rate. This thesis focuses on improving the performance of current methods by developing new algorithms corresponding to three steps of preprocessing, sequence optimisation and post-processing using machine learning for more comprehensive interrogation of MS/MS datasets. From the machine learning point of view, the three steps can be addressed by solving different tasks such as classification, optimisation, and symbolic regression. Since Evolutionary Algorithms (EAs), as effective global search techniques, have shown promising results in solving these problems, this thesis investigates the capability of EAs in improving the de novo peptide sequencing. In the preprocessing step, this thesis proposes an effective GP-based method for classification of signal and noise peaks in highly imbalanced MS/MS spectra with the purpose of having a positive influence on the reliability of the peptide identification. The results show that the proposed algorithm is the most stable classification method across various noise ratios, outperforming six other benchmark classification algorithms. The experimental results show a significant improvement in high confidence peptide assignments to MS/MS spectra when the data is preprocessed by the proposed GP method. Moreover, the first multi-objective GP approach for classification of peaks in MS/MS data, aiming at maximising the accuracy of the minority class (signal peaks) and the accuracy of the majority class (noise peaks) is also proposed in this thesis. The results show that the multi-objective GP method outperforms the single objective GP algorithm and a popular multi-objective approach in terms of retaining more signal peaks and removing more noise peaks. The multi-objective GP approach significantly improved the reliability of peptide identification. This thesis proposes a GA-based method to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full-length sequences. The proposed GA method benefits the GA capability of searching a large search space of potential amino acid sequences to find the most likely full-length sequence. The experimental results show that the proposed method outperforms the most commonly used de novo sequencing method at both amino acid level and peptide level. This thesis also proposes a novel method for re-scoring and re-ranking the peptide spectrum matches (PSMs) from the result of de novo peptide sequencing, aiming at minimising the false discovery rate as a post-processing approach. The proposed GP method evolves the computer programs to perform regression and classification simultaneously in order to generate an effective scoring function for finding the correct PSMs from many incorrect ones. The results show that the new GP-based PSM scoring function significantly improves the identification of full-length peptides when it is used to post-process the de novo sequencing results.</p>

Download Full-text

The structure of the Drosophila melanogaster sex peptide: Identification of hydroxylated isoleucine and a strain variation in the pattern of amino acid hydroxylation

Insect Biochemistry and Molecular Biology ◽

10.1016/j.ibmb.2020.103414 ◽

2020 ◽

Vol 124 ◽

pp. 103414

Author(s):

Sebastian Sturm ◽

Adam Dowle ◽

Neil Audsley ◽

R. Elwyn Isaac

Keyword(s):

Drosophila Melanogaster ◽

Amino Acid ◽

Peptide Identification ◽

Strain Variation ◽

Sex Peptide ◽

Acid Hydroxylation

Download Full-text

Validation of Peptide Identification Results in Proteomics Using Amino Acid Counting

PROTEOMICS ◽

10.1002/pmic.201800117 ◽

2018 ◽

Vol 18 (23) ◽

pp. 1800117 ◽

Cited By ~ 5

Author(s):

Julia A. Bubis ◽

Lev I. Levitsky ◽

Mark V. Ivanov ◽

Mikhail V. Gorshkov

Keyword(s):

Amino Acid ◽

Peptide Identification

Download Full-text

Vibrational spectroscopy of a non-aromatic amino acid-based model peptide: identification of the γ-turn motif of the peptide backbone

Physical Chemistry Chemical Physics ◽

10.1039/b417204c ◽

2005 ◽

Vol 7 (1) ◽

pp. 13-15 ◽

Cited By ~ 39

Author(s):

Isabelle Compagnon ◽

Jos Oomens ◽

Joost Bakker ◽

Gerard Meijer ◽

Gert von Helden

Keyword(s):

Amino Acid ◽

Vibrational Spectroscopy ◽

Aromatic Amino Acid ◽

Peptide Identification ◽

Model Peptide ◽

Peptide Backbone

Download Full-text

Improved short peptide identification using HILIC–MS/MS: Retention time prediction model based on the impact of amino acid position in the peptide sequence

Food Chemistry ◽

10.1016/j.foodchem.2014.10.104 ◽

2015 ◽

Vol 173 ◽

pp. 847-854 ◽

Cited By ~ 45

Author(s):

Solène Le Maux ◽

Alice B. Nongonierma ◽

Richard J. FitzGerald

Keyword(s):

Amino Acid ◽

Prediction Model ◽

Retention Time ◽

Peptide Identification ◽

Amino Acid Position ◽

Peptide Sequence ◽

Short Peptide ◽

Retention Time Prediction ◽

Time Prediction ◽

The Impact

Download Full-text

Using Ion Mobility Data to Improve Peptide Identification: Intrinsic Amino Acid Size Parameters

Journal of Proteome Research ◽

10.1021/pr1011312 ◽

2011 ◽

Vol 10 (5) ◽

pp. 2318-2329 ◽

Cited By ~ 39

Author(s):

Stephen J. Valentine ◽

Michael A. Ewing ◽

Jonathan M. Dilger ◽

Matthew S. Glover ◽

Scott Geromanos ◽

...

Keyword(s):

Amino Acid ◽

Ion Mobility ◽

Peptide Identification ◽

Mobility Data

Download Full-text

Synthesis, extraction and idetification of meat bioactive peptides: a review

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/888/1/012058 ◽

2021 ◽

Vol 888 (1) ◽

pp. 012058

Author(s):

Edy Susanto ◽

Anik Fadlilah ◽

Muhammad Fathul Amin

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Functional Food ◽

Bioactive Peptides ◽

Peptide Identification ◽

Bioactive Peptide ◽

Covalent Bonds ◽

Peptide Bonds ◽

Body Location ◽

Potential Use

Abstract The consumption of meat should consider the concept of functional food. The meat had a highquality protein and contain of bioactive peptide compounds. Amino acid was component of bioactive peptides compound. It joined by covalent bonds known as amide or peptide bonds. A lot of research was currently focused on the bioactive peptide compounds isolated from myofibril and sarcoplasmic proteins with the synthesis, extraction, and identification methods. This study used a systematic review to get the structure of amino acids that the source of bioactive components and the principle of synthesis, extraction and identification of bioactive peptide in the meat. This paper highlights were finding on the structure of amino acid in the meat. The proportion of amino acids was also different in each animal body location. The result identified that more than 170 peptides were released from the main structure of the myofibril (actin, myosin) and sarcoplasmic muscle proteins, and the synthesis, extraction and bioactive peptide identification in the meat as well as their potential use as functional food.

Download Full-text

Approaches to Pharmaceutical Analysis of Modern Peptide and Oligonucleotide Products as Illustrated by a Small Interfering RNA-Based Novel Therapeutic for the Treatment of Bronchial Asthma

BIOpreparations Prevention Diagnosis Treatment ◽

10.30895/2221-996x-2018-18-3-184-190 ◽

2018 ◽

Vol 18 (3) ◽

pp. 184-190

Author(s):

L. M. Krasnykh ◽

V. V. Smirnov ◽

G. V. Ramenskaya ◽

G. F. Vasilenko ◽

I. P. Shilovsky ◽

...

Keyword(s):

Mass Spectrometry ◽

Amino Acid ◽

Small Interfering Rna ◽

Peptide Identification ◽

Test Methods ◽

Complex Data ◽

Amino Acid Residues ◽

Combination Of Methods ◽

Interfering Rna

Methods used to control the quality of peptide products depend on the level of development of analytical and bioorganic chemistry, and the level of instrumentation. Peptide identification is a difficult task and largely depends on the complexity of its structure. There does not exist a comprehensive and simple test, except for NMR, which, however, is rather expensive and time-consuming and involves complex data interpretations. Moreover, it does not allow for unambiguous determination of the peptide purity and formula (amino acid composition, sequence, chirality of amino acid residues). For this reason, a combination of methods is often used, including amino acid analysis, TLC/HPLC and mass spectrometry, and, less frequently, sequencing. Current international practice of peptide analysis is to use HPLC in combination with mass spectrometric, mainly tandem (HPLC-MS/MS), detection. According to literature sources the amino acid sequence of linear peptides can be analysed using various enzymes and subsequent identification of proteolysis products by mass spectrometry. This article presents approaches to the development of test methods for analysis of purity and identification testing of a small interfering RNA-based novel medicinal product, which will help standardise and control the quality of the production process.

Download Full-text

Uncovering thousands of new HLA antigens and phosphopeptides with deep learning-based sequence-mask-search de novo peptide sequencing framework

10.1101/667527 ◽

2019 ◽

Author(s):

Korrawe Karunratanakul ◽

Hsin-Yao Tang ◽

David W. Speicher ◽

Ekapol Chuangsuwanich ◽

Sira Sriswasdi

Keyword(s):

Deep Learning ◽

Amino Acid ◽

De Novo ◽

Hla Antigens ◽

Peptide Identification ◽

Peptide Sequencing ◽

Amino Acid Sequences ◽

Mass Spectrometry Data ◽

Model Organisms ◽

Invaluable Tool

ABSTRACTTypical analyses of mass spectrometry data only identify amino acid sequences that exist in reference databases. This restricts the possibility of discovering new peptides such as those that contain uncharacterized mutations or originate from unexpected processing of RNAs and proteins. De novo peptide sequencing approaches address this limitation but often suffer from low accuracy and require extensive validation by experts. Here, we develop SMSNet, a deep learning-based hybrid de novo peptide sequencing framework that achieves >95% amino acid accuracy while retaining good identification coverage. Applications of SMSNet on landmark proteomics and peptideomics studies reveal over 10,000 previously uncharacterized HLA antigens and phosphopeptides and in conjunction with database-search methods, expand the coverage of peptide identification by almost 30%. The power to accurately identify new peptides of SMSNet would make it an invaluable tool for any future proteomics and peptidomics studies – especially cancer neoantigen discovery and proteome characterization of non-model organisms.

Download Full-text