de novo peptide sequencing Latest Research Papers

Evolutionary Algorithms for Improving De Novo Peptide Sequencing

10.26686/wgtn.17145581.v1 ◽

2021 ◽

Author(s):

◽

Samaneh Azari

Keyword(s):

Amino Acid ◽

De Novo ◽

Peptide Identification ◽

Peptide Sequencing ◽

De Novo Sequencing ◽

Amino Acid Sequences ◽

Full Length ◽

Multi Objective ◽

De Novo Peptide Sequencing ◽

De Novo Peptide

<p>De novo peptide sequencing algorithms have been developed for peptide identification in proteomics from tandem mass spectra (MS/MS), which can be used to identify and discover novel peptides and proteins that do not have a database available. Despite improvements in MS instrumentation and de novo sequencing methods, a significant number of CID MS/MS spectra still remain unassigned with the current algorithms, often leading to low confidence of peptide assignments to the spectra. Moreover, current algorithms often fail to construct the completely matched sequences, and produce partial matches. Therefore, identification of full-length peptides remains challenging. Another major challenge is the existence of noise in MS/MS spectra which makes the data highly imbalanced. Also missing peaks, caused by incomplete MS fragmentation makes it more difficult to infer a full-length peptide sequence. In addition, the large search space of all possible amino acid sequences for each spectrum leads to a high false discovery rate. This thesis focuses on improving the performance of current methods by developing new algorithms corresponding to three steps of preprocessing, sequence optimisation and post-processing using machine learning for more comprehensive interrogation of MS/MS datasets. From the machine learning point of view, the three steps can be addressed by solving different tasks such as classification, optimisation, and symbolic regression. Since Evolutionary Algorithms (EAs), as effective global search techniques, have shown promising results in solving these problems, this thesis investigates the capability of EAs in improving the de novo peptide sequencing. In the preprocessing step, this thesis proposes an effective GP-based method for classification of signal and noise peaks in highly imbalanced MS/MS spectra with the purpose of having a positive influence on the reliability of the peptide identification. The results show that the proposed algorithm is the most stable classification method across various noise ratios, outperforming six other benchmark classification algorithms. The experimental results show a significant improvement in high confidence peptide assignments to MS/MS spectra when the data is preprocessed by the proposed GP method. Moreover, the first multi-objective GP approach for classification of peaks in MS/MS data, aiming at maximising the accuracy of the minority class (signal peaks) and the accuracy of the majority class (noise peaks) is also proposed in this thesis. The results show that the multi-objective GP method outperforms the single objective GP algorithm and a popular multi-objective approach in terms of retaining more signal peaks and removing more noise peaks. The multi-objective GP approach significantly improved the reliability of peptide identification. This thesis proposes a GA-based method to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full-length sequences. The proposed GA method benefits the GA capability of searching a large search space of potential amino acid sequences to find the most likely full-length sequence. The experimental results show that the proposed method outperforms the most commonly used de novo sequencing method at both amino acid level and peptide level. This thesis also proposes a novel method for re-scoring and re-ranking the peptide spectrum matches (PSMs) from the result of de novo peptide sequencing, aiming at minimising the false discovery rate as a post-processing approach. The proposed GP method evolves the computer programs to perform regression and classification simultaneously in order to generate an effective scoring function for finding the correct PSMs from many incorrect ones. The results show that the new GP-based PSM scoring function significantly improves the identification of full-length peptides when it is used to post-process the de novo sequencing results.</p>

Evolutionary Algorithms for Improving De Novo Peptide Sequencing

10.26686/wgtn.17145581 ◽

2021 ◽

Author(s):

◽

Samaneh Azari

Keyword(s):

Amino Acid ◽

De Novo ◽

Peptide Identification ◽

Peptide Sequencing ◽

De Novo Sequencing ◽

Amino Acid Sequences ◽

Full Length ◽

Multi Objective ◽

De Novo Peptide Sequencing ◽

De Novo Peptide

<p>De novo peptide sequencing algorithms have been developed for peptide identification in proteomics from tandem mass spectra (MS/MS), which can be used to identify and discover novel peptides and proteins that do not have a database available. Despite improvements in MS instrumentation and de novo sequencing methods, a significant number of CID MS/MS spectra still remain unassigned with the current algorithms, often leading to low confidence of peptide assignments to the spectra. Moreover, current algorithms often fail to construct the completely matched sequences, and produce partial matches. Therefore, identification of full-length peptides remains challenging. Another major challenge is the existence of noise in MS/MS spectra which makes the data highly imbalanced. Also missing peaks, caused by incomplete MS fragmentation makes it more difficult to infer a full-length peptide sequence. In addition, the large search space of all possible amino acid sequences for each spectrum leads to a high false discovery rate. This thesis focuses on improving the performance of current methods by developing new algorithms corresponding to three steps of preprocessing, sequence optimisation and post-processing using machine learning for more comprehensive interrogation of MS/MS datasets. From the machine learning point of view, the three steps can be addressed by solving different tasks such as classification, optimisation, and symbolic regression. Since Evolutionary Algorithms (EAs), as effective global search techniques, have shown promising results in solving these problems, this thesis investigates the capability of EAs in improving the de novo peptide sequencing. In the preprocessing step, this thesis proposes an effective GP-based method for classification of signal and noise peaks in highly imbalanced MS/MS spectra with the purpose of having a positive influence on the reliability of the peptide identification. The results show that the proposed algorithm is the most stable classification method across various noise ratios, outperforming six other benchmark classification algorithms. The experimental results show a significant improvement in high confidence peptide assignments to MS/MS spectra when the data is preprocessed by the proposed GP method. Moreover, the first multi-objective GP approach for classification of peaks in MS/MS data, aiming at maximising the accuracy of the minority class (signal peaks) and the accuracy of the majority class (noise peaks) is also proposed in this thesis. The results show that the multi-objective GP method outperforms the single objective GP algorithm and a popular multi-objective approach in terms of retaining more signal peaks and removing more noise peaks. The multi-objective GP approach significantly improved the reliability of peptide identification. This thesis proposes a GA-based method to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full-length sequences. The proposed GA method benefits the GA capability of searching a large search space of potential amino acid sequences to find the most likely full-length sequence. The experimental results show that the proposed method outperforms the most commonly used de novo sequencing method at both amino acid level and peptide level. This thesis also proposes a novel method for re-scoring and re-ranking the peptide spectrum matches (PSMs) from the result of de novo peptide sequencing, aiming at minimising the false discovery rate as a post-processing approach. The proposed GP method evolves the computer programs to perform regression and classification simultaneously in order to generate an effective scoring function for finding the correct PSMs from many incorrect ones. The results show that the new GP-based PSM scoring function significantly improves the identification of full-length peptides when it is used to post-process the de novo sequencing results.</p>

Rigorous estimation of post-translational proteasomal splicing in the immunopeptidome

10.1101/2021.05.26.445792 ◽

2021 ◽

Author(s):

Kamil J Cygan ◽

Ehdieh Khaledian ◽

Lili Blumenberg ◽

Robert R Salzler ◽

Darshit Shah ◽

...

Keyword(s):

Immune Response ◽

Autoimmune Disease ◽

De Novo ◽

Critical Role ◽

Unknown Origin ◽

Peptide Sequencing ◽

De Novo Peptide Sequencing ◽

De Novo Peptide

Recently, de novo peptide sequencing has made it possible to gain new insights into the human immunopeptidome without relying on peptide databases, while identifying peptides of unknown origin. Many recent studies have attributed post-translational proteasomal splicing as the origin of those peptides. Here, we describe a peptide source assignment workflow to rigorously assign the source of de novo sequenced peptides and find that the estimated extent of post-translational splicing in the immunopeptidome is much lower than previously reported. Our approach demonstrates that many peptides that were thought to be post-translationally spliced are likely linear peptides, and many peptides that were thought to be trans-spliced could be cis-spliced. We believe our approach furthers the understanding of post-translationally spliced peptides and thus improves the characterization of immunopeptidome which plays a critical role in the immune response to antigens in cancer, autoimmune disease, and infections.

Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices

Nature Machine Intelligence ◽

10.1038/s42256-021-00304-3 ◽

2021 ◽

Author(s):

Rui Qiao ◽

Ngoc Hieu Tran ◽

Lei Xin ◽

Xin Chen ◽

Ming Li ◽

...

Keyword(s):

High Resolution ◽

De Novo ◽

Peptide Sequencing ◽

De Novo Peptide Sequencing ◽

Instrument Resolution ◽

De Novo Peptide

Deep Novo A+: Improving the Deep Learning Model for De Novo Peptide Sequencing with Additional Ion Types and Validation Set

Current Bioinformatics ◽

10.2174/1574893615666200204112347 ◽

2021 ◽

Vol 15 (8) ◽

pp. 949-954

Author(s):

Lei Di ◽

Yongxing He ◽

Yonggang Lu

Keyword(s):

Deep Learning ◽

De Novo ◽

Peptide Sequencing ◽

De Novo Sequencing ◽

Superior Performance ◽

De Novo Peptide Sequencing ◽

Validation Set ◽

De Novo Peptide ◽

Deep Learning Model

Background: De novo peptide sequencing is one of the key technologies in proteomics, which can extract peptide sequences directly from tandem mass spectrometry (MS/MS) spectra without any protein databases. Since the accuracy and efficiency of de novo peptide sequencing can be affected by the quality of the MS/MS data, the DeepNovo method using deep learning for de novo peptide sequencing is introduced, which outperforms the other state-of-the-art de novo sequencing methods. Objective: For superior performance and better generalization ability, additional ion types of spectra should be considered and the model of DeepNovo should be adaptive. Methods: Two improvements are introduced in the DeepNovo A+ method: a_ions are added in the spectral analysis, and the validation set is used to automatically determine the number of training epochs. Results: Experiments show that compared to the DeepNovo method, the DeepNovo A+ method can consistently improve the accuracy of de novo sequencing under different conditions. Conclusion: By adding a_ions and using the validation set, the performance of de novo sequencing can be improved effectively.

A residual network for de novo peptide sequencing with attention mechanism

2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv50220.2020.9305327 ◽

2020 ◽

Author(s):

Zihang Liu ◽

Chunhui Zhao

Keyword(s):

De Novo ◽

Peptide Sequencing ◽

Attention Mechanism ◽

Residual Network ◽

De Novo Peptide Sequencing ◽

De Novo Peptide

De Novo Peptide Sequencing Reveals Many Cyclopeptides in the Human Gut and Other Environments

Cell Systems ◽

10.1016/j.cels.2019.11.007 ◽

2020 ◽

Vol 10 (1) ◽

pp. 99-108.e5 ◽

Cited By ~ 4

Author(s):

Bahar Behsaz ◽

Hosein Mohimani ◽

Alexey Gurevich ◽

Andrey Prjibelski ◽

Mark Fisher ◽

...

Keyword(s):

De Novo ◽

Peptide Sequencing ◽

Human Gut ◽

De Novo Peptide Sequencing ◽

De Novo Peptide

Neurotrophic properties and the de novo peptide sequencing of edible bird's nest extracts

Food Bioscience ◽

10.1016/j.fbio.2019.100466 ◽

2019 ◽

Vol 32 ◽

pp. 100466 ◽

Cited By ~ 1

Author(s):

Mei Yeng Yew ◽

Rhun Yian Koh ◽

Soi Moi Chye ◽

Syafiq Asnawi Zainal Abidin ◽

Iekhsan Othman ◽

...

Keyword(s):

De Novo ◽

Peptide Sequencing ◽

De Novo Peptide Sequencing ◽

Edible Bird’S Nest ◽

De Novo Peptide

Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid De Novo Peptide Sequencing Framework

Molecular & Cellular Proteomics ◽

10.1074/mcp.tir119.001656 ◽

2019 ◽

Vol 18 (12) ◽

pp. 2478-2491 ◽

Cited By ~ 5

Author(s):

Korrawe Karunratanakul ◽

Hsin-Yao Tang ◽

David W. Speicher ◽

Ekapol Chuangsuwanich ◽

Sira Sriswasdi

Keyword(s):

De Novo ◽

Peptide Sequencing ◽

De Novo Peptide Sequencing ◽

De Novo Peptide

Evidence of structural variations of the gyroxin toxin from Crotalus durissus terrificus snake venom by means of the de novo peptide sequencing strategy

Toxicon ◽

10.1016/j.toxicon.2019.06.072 ◽

2019 ◽

Vol 168 ◽

pp. S13-S14

Author(s):

Amanda Almeida Resende ◽

Laudicéia Alves De Oliveira ◽

Rui Seabra Ferreira Júnior ◽

Daniel Carvalho Pimenta ◽

Lucilene Delazari Dos Santos

Keyword(s):

Snake Venom ◽

De Novo ◽

Peptide Sequencing ◽

Structural Variations ◽

Crotalus Durissus Terrificus ◽

De Novo Peptide Sequencing ◽

Sequencing Strategy ◽

Crotalus Durissus ◽

De Novo Peptide

de novo peptide sequencing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Evolutionary Algorithms for Improving De Novo Peptide Sequencing

Evolutionary Algorithms for Improving De Novo Peptide Sequencing

Rigorous estimation of post-translational proteasomal splicing in the immunopeptidome

Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices

Deep Novo A+: Improving the Deep Learning Model for De Novo Peptide Sequencing with Additional Ion Types and Validation Set

A residual network for de novo peptide sequencing with attention mechanism

De Novo Peptide Sequencing Reveals Many Cyclopeptides in the Human Gut and Other Environments

Neurotrophic properties and the de novo peptide sequencing of edible bird's nest extracts

Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid De Novo Peptide Sequencing Framework

Evidence of structural variations of the gyroxin toxin from Crotalus durissus terrificus snake venom by means of the de novo peptide sequencing strategy

Export Citation Format

de novo peptide sequencingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Evolutionary Algorithms for Improving De Novo Peptide Sequencing

Evolutionary Algorithms for Improving De Novo Peptide Sequencing

Rigorous estimation of post-translational proteasomal splicing in the immunopeptidome

Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices

Deep Novo A+: Improving the Deep Learning Model for De Novo Peptide Sequencing with Additional Ion Types and Validation Set

A residual network for de novo peptide sequencing with attention mechanism

De Novo Peptide Sequencing Reveals Many Cyclopeptides in the Human Gut and Other Environments

Neurotrophic properties and the de novo peptide sequencing of edible bird's nest extracts

Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid De Novo Peptide Sequencing Framework

Evidence of structural variations of the gyroxin toxin from Crotalus durissus terrificus snake venom by means of the de novo peptide sequencing strategy

de novo peptide sequencing
Recently Published Documents