peptide sequencing
Recently Published Documents


TOTAL DOCUMENTS

466
(FIVE YEARS 55)

H-INDEX

56
(FIVE YEARS 5)

2021 ◽  
Author(s):  
◽  
Samaneh Azari

<p>De novo peptide sequencing algorithms have been developed for peptide identification in proteomics from tandem mass spectra (MS/MS), which can be used to identify and discover novel peptides and proteins that do not have a database available. Despite improvements in MS instrumentation and de novo sequencing methods, a significant number of CID MS/MS spectra still remain unassigned with the current algorithms, often leading to low confidence of peptide assignments to the spectra. Moreover, current algorithms often fail to construct the completely matched sequences, and produce partial matches. Therefore, identification of full-length peptides remains challenging. Another major challenge is the existence of noise in MS/MS spectra which makes the data highly imbalanced. Also missing peaks, caused by incomplete MS fragmentation makes it more difficult to infer a full-length peptide sequence. In addition, the large search space of all possible amino acid sequences for each spectrum leads to a high false discovery rate. This thesis focuses on improving the performance of current methods by developing new algorithms corresponding to three steps of preprocessing, sequence optimisation and post-processing using machine learning for more comprehensive interrogation of MS/MS datasets. From the machine learning point of view, the three steps can be addressed by solving different tasks such as classification, optimisation, and symbolic regression. Since Evolutionary Algorithms (EAs), as effective global search techniques, have shown promising results in solving these problems, this thesis investigates the capability of EAs in improving the de novo peptide sequencing. In the preprocessing step, this thesis proposes an effective GP-based method for classification of signal and noise peaks in highly imbalanced MS/MS spectra with the purpose of having a positive influence on the reliability of the peptide identification. The results show that the proposed algorithm is the most stable classification method across various noise ratios, outperforming six other benchmark classification algorithms. The experimental results show a significant improvement in high confidence peptide assignments to MS/MS spectra when the data is preprocessed by the proposed GP method. Moreover, the first multi-objective GP approach for classification of peaks in MS/MS data, aiming at maximising the accuracy of the minority class (signal peaks) and the accuracy of the majority class (noise peaks) is also proposed in this thesis. The results show that the multi-objective GP method outperforms the single objective GP algorithm and a popular multi-objective approach in terms of retaining more signal peaks and removing more noise peaks. The multi-objective GP approach significantly improved the reliability of peptide identification. This thesis proposes a GA-based method to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full-length sequences. The proposed GA method benefits the GA capability of searching a large search space of potential amino acid sequences to find the most likely full-length sequence. The experimental results show that the proposed method outperforms the most commonly used de novo sequencing method at both amino acid level and peptide level. This thesis also proposes a novel method for re-scoring and re-ranking the peptide spectrum matches (PSMs) from the result of de novo peptide sequencing, aiming at minimising the false discovery rate as a post-processing approach. The proposed GP method evolves the computer programs to perform regression and classification simultaneously in order to generate an effective scoring function for finding the correct PSMs from many incorrect ones. The results show that the new GP-based PSM scoring function significantly improves the identification of full-length peptides when it is used to post-process the de novo sequencing results.</p>


2021 ◽  
Author(s):  
◽  
Samaneh Azari

<p>De novo peptide sequencing algorithms have been developed for peptide identification in proteomics from tandem mass spectra (MS/MS), which can be used to identify and discover novel peptides and proteins that do not have a database available. Despite improvements in MS instrumentation and de novo sequencing methods, a significant number of CID MS/MS spectra still remain unassigned with the current algorithms, often leading to low confidence of peptide assignments to the spectra. Moreover, current algorithms often fail to construct the completely matched sequences, and produce partial matches. Therefore, identification of full-length peptides remains challenging. Another major challenge is the existence of noise in MS/MS spectra which makes the data highly imbalanced. Also missing peaks, caused by incomplete MS fragmentation makes it more difficult to infer a full-length peptide sequence. In addition, the large search space of all possible amino acid sequences for each spectrum leads to a high false discovery rate. This thesis focuses on improving the performance of current methods by developing new algorithms corresponding to three steps of preprocessing, sequence optimisation and post-processing using machine learning for more comprehensive interrogation of MS/MS datasets. From the machine learning point of view, the three steps can be addressed by solving different tasks such as classification, optimisation, and symbolic regression. Since Evolutionary Algorithms (EAs), as effective global search techniques, have shown promising results in solving these problems, this thesis investigates the capability of EAs in improving the de novo peptide sequencing. In the preprocessing step, this thesis proposes an effective GP-based method for classification of signal and noise peaks in highly imbalanced MS/MS spectra with the purpose of having a positive influence on the reliability of the peptide identification. The results show that the proposed algorithm is the most stable classification method across various noise ratios, outperforming six other benchmark classification algorithms. The experimental results show a significant improvement in high confidence peptide assignments to MS/MS spectra when the data is preprocessed by the proposed GP method. Moreover, the first multi-objective GP approach for classification of peaks in MS/MS data, aiming at maximising the accuracy of the minority class (signal peaks) and the accuracy of the majority class (noise peaks) is also proposed in this thesis. The results show that the multi-objective GP method outperforms the single objective GP algorithm and a popular multi-objective approach in terms of retaining more signal peaks and removing more noise peaks. The multi-objective GP approach significantly improved the reliability of peptide identification. This thesis proposes a GA-based method to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full-length sequences. The proposed GA method benefits the GA capability of searching a large search space of potential amino acid sequences to find the most likely full-length sequence. The experimental results show that the proposed method outperforms the most commonly used de novo sequencing method at both amino acid level and peptide level. This thesis also proposes a novel method for re-scoring and re-ranking the peptide spectrum matches (PSMs) from the result of de novo peptide sequencing, aiming at minimising the false discovery rate as a post-processing approach. The proposed GP method evolves the computer programs to perform regression and classification simultaneously in order to generate an effective scoring function for finding the correct PSMs from many incorrect ones. The results show that the new GP-based PSM scoring function significantly improves the identification of full-length peptides when it is used to post-process the de novo sequencing results.</p>


2021 ◽  
Vol 71 (1) ◽  
Author(s):  
Charles E. Deutch ◽  
Amy M. Farden ◽  
Emily S. DiCesare

Abstract Purpose Gracilibacillus dipsosauri strain DD1 is a salt-tolerant Gram-positive bacterium that can hydrolyze the synthetic substrates o-nitrophenyl-β-d-galactopyranoside (β-ONP-galactose) and p-nitrophenyl-α-d-galactopyranoside (α-PNP-galactose). The goals of this project were to characterize the enzymes responsible for these activities and to identify the genes encoding them. Methods G. dipsosauri strain DD1 was grown in tryptic soy broth containing various carbohydrates at 37 °C with aeration. Enzyme activities in cell extracts and whole cells were measured colorimetrically by hydrolysis of synthetic substrates containing nitrophenyl moieties. Two enzymes with β-galactosidase activity and one with α-galactosidase activity were partially purified by ammonium sulfate fractionation, ion-exchange chromatography, and gel-filtration chromatography from G. dipsosauri. Coomassie Blue-stained bands corresponding to each activity were excised from nondenaturing polyacrylamide gels and subjected to peptide sequencing after trypsin digestion and HPLC/MS analysis. Result Formation of β-galactosidase and α-galactosidase activities was repressed by d-glucose and not induced by lactose or d-melibiose. β-Galactosidase I had hydrolytic and transgalactosylation activity with lactose as the substrate but β-galactosidase II showed no activity towards lactose. The α-galactosidase had hydrolytic and transgalactosylation activity with d-melibiose but not with d-raffinose. β-Galactosidase I had a lower Km with β-ONP-galactose as the substrate (0.693 mmol l−1) than β-galactosidase II (1.662 mmol l−1), was active at more alkaline pH, and was inhibited by the product d-galactose. β-Galactosidase II was active at more acidic pH, was partially inhibited by ammonium salts, and showed higher activity with α-PNP-arabinose as a substrate. The α-galactosidase had a low Km with α-PNP-galactose as the substrate (0.338 mmol l−1), a pH optimum of about 7, and was inhibited by chloride-containing salts. β-Galactosidase I activity was found to be due to the protein A0A317L6F0 (encoded by gene DLJ74_04930), β-galactosidase II activity to the protein A0A317KZG3 (encoded by gene DLJ74_12640), and the α-galactosidase activity to the protein A0A317KU47 (encoded by gene DLJ74_17745). Conclusions G. dipsosauri forms three intracellular enzymes with different physiological properties which are responsible for the hydrolysis of β-ONP-galactose and α-PNP-galactose. BLAST analysis indicated that similar β-galactosidases may be formed by G. ureilyticus, G. orientalis, and G. kekensis and similar α-galactosidases by these bacteria and G. halophilus.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ahmed Abdul Kareem Najm ◽  
Ahmad Azfaralariff ◽  
Herryawan Ryadi Eziwar Dyari ◽  
Babul Airianah Othman ◽  
Muhammad Shahid ◽  
...  

AbstractPrevious study has shown the antimicrobial activities of mucus protein extracted from Anabastestudineus. In this study, we are interested in characterizing the anticancer activity of the A.testudineus antimicrobial peptides (AMPs). The mucus was extracted, fractioned, and subjected to antibacterial activity testing to confirm the fish's AMPs production. The cytotoxic activity of each fraction was also identified. Fraction 2 (F2), which shows toxicity against MCF7 and MDA-MB-231 were sent for peptide sequencing to identify the bioactive peptide. The two peptides were then synthetically produced and subjected to cytotoxic assay to prove their efficacy against cancer cell lines. The IC50 for AtMP1 against MCF7 and MDA-MB-231 were 8.25 ± 0.14 μg/ml and 9.35 ± 0.25 μg/ml respectively, while for AtMP2 it is 5.89 ± 0.14 μg/ml and 6.97 ± 0.24 μg/ml respectively. AtMP1 and AtMP2 treatment for 48 h induced breast cancer cell cycle arrest and apoptosis by upregulating the p53, which lead to upregulate pro-apoptotic BAX gene and downregulate the anti-apoptotic BCL-2 gene, consequently, trigger the activation of the caspase-3. This interaction was supported by docking analysis (QuickDBD, HPEPDOCK, and ZDOCK) and immunoprecipitation. This study provided new prospects in the development of highly effective and selective cancer therapeutics based on antimicrobial peptides.


Marine Drugs ◽  
2021 ◽  
Vol 19 (12) ◽  
pp. 668
Author(s):  
Andrei V. Grinchenko ◽  
Alex von Kriegsheim ◽  
Nikita A. Shved ◽  
Anna E. Egorova ◽  
Diana V. Ilyaskina ◽  
...  

C1q domain-containing (C1qDC) proteins are a group of biopolymers involved in immune response as pattern recognition receptors (PRRs) in a lectin-like manner. A new protein MkC1qDC from the hemolymph plasma of Modiolus kurilensis bivalve mollusk widespread in the Northwest Pacific was purified. The isolation procedure included ammonium sulfate precipitation followed by affinity chromatography on pectin-Sepharose. The full-length MkC1qDC sequence was assembled using de novo mass-spectrometry peptide sequencing complemented with N-terminal Edman’s degradation, and included 176 amino acid residues with molecular mass of 19 kDa displaying high homology to bivalve C1qDC proteins. MkC1qDC demonstrated antibacterial properties against Gram-negative and Gram-positive strains. MkC1qDC binds to a number of saccharides in Ca2+-dependent manner which characterized by structural meta-similarity in acidic group enrichment of galactose and mannose derivatives incorporated in diversified molecular species of glycans. Alginate, κ-carrageenan, fucoidan, and pectin were found to be highly effective inhibitors of MkC1qDC activity. Yeast mannan, lipopolysaccharide (LPS), peptidoglycan (PGN) and mucin showed an inhibitory effect at concentrations three orders of magnitude greater than for the most effective saccharides. MkC1qDC localized to the mussel hemal system and interstitial compartment. Intriguingly, MkC1qDC was found to suppress proliferation of human adenocarcinoma HeLa cells in a dose-dependent manner, indicating to the biomedical potential of MkC1qDC protein.


2021 ◽  
Vol 28 ◽  
Author(s):  
P. Boomathi Pandeswari ◽  
R. Nagarjuna Chary ◽  
A.S. Kamalanathan ◽  
Sripadi Prabhakar ◽  
Varatharajan Sabareesh

Background: Middle-down (MD) proteomics is an emerging approach for reliable identification of post- translational modifications and isoforms, as this approach focuses on proteolytic peptides containing > 25 - 30 amino acid residues (a.a.r.), which are longer than typical tryptic peptides. Such longer peptides can be obtained by AspN, GluC, LysC proteases. Additionally, some special proteases were developed specifically to effect MD approach, e.g., OmpT, Sap9, etc. However, these proteases are expensive. Herein we report a cost-effective strategy, ‘arginine modification-cum trypsin digestion’, which can produce longer tryptic peptides resembling LysC peptides derived from proteins. Objective:: To obtain proteolytic peptides that resemble LysC peptides, by using 'trypsin', which is an less expensive protease. Methods: This strategy is based on the simple principle that trypsin cannot act at the C-termini of those arginines in proteins, whose sidechain guanidine groups are modified by 1,2-cyclohexanedione or phenylglyoxal. Results: As a proof of concept, we demonstrate this strategy on four models: -casein (bovine), - lactoglobulin (bovine), ovalbumin (chick) and transferrin (human), by electrospray ionization-mass spectrometry (ESI-MS) involving hybrid quadrupole time-of-flight. From the ESI-MS of these models, we obtained several arginine modified tryptic peptides, whose lengths are in the range, 30 - 60 a.a.r. The collision-induced dissociation MS/MS characteristics of some of the arginine modified longer tryptic peptides are compared with the unmodified standard tryptic peptides. Conclusion: The strategy followed in this proof-of-concept study, not only helps in obtaining longer tryptic peptides that mimic LysC proteolytic peptides, but also facilitates in enhancing the probability of missed cleavages by the trypsin. Hence, this method aids in evading the possibility of obtaining very short peptides that are < 5 - 10 a.a.r. Therefore, this is indeed an cost-effective alternative/substitute for LysC proteolysis and in turn, for those MD proteomic studies that utilize LysC. Additionally, this methodology can be fruitful for mass spectrometry based de novo protein and peptide sequencing.


2021 ◽  
Author(s):  
Chatchapon Sricharoensuk ◽  
Tanupat Boonchalermvichien ◽  
Phijitra Muanwien ◽  
Poorichaya Somparn ◽  
Trairak Pisitkun ◽  
...  

AbstractModern vaccine designs and studies of human leukocyte antigen (HLA)-mediated immune responses rely heavily on the knowledge of HLA allele-specific binding motifs and computational prediction of HLA-peptide binding affinity. Breakthroughs in HLA peptidomics have considerably expanded the databases of natural HLA ligands and enabled detailed characterizations of HLA-peptide binding specificity. However, cautions must be made when analyzing HLA peptidomics data because identified peptides may be contaminants in mass spectrometry or may weakly bind to the HLA molecules. Here, a hybrid de novo peptide sequencing approach was applied to large-scale mono-allelic HLA peptidomics datasets to uncover new ligands and refine current knowledge of HLA binding motifs. Up to 12-40% of the peptidomics data were low-binding affinity peptides with an arginine or a lysine at the C-terminus and likely to be tryptic peptide contaminants. Thousands of these peptides have been reported in a community database as legitimate ligands and might be erroneously used for training prediction models. Furthermore, unsupervised clustering of identified ligands revealed additional binding motifs for several HLA class I alleles and effectively isolated outliers that were experimentally confirmed to be false positives. Overall, our findings expanded the knowledge of HLA binding specificity and advocated for more rigorous interpretation of HLA peptidomics data that will ensure the high validity of community HLA ligandome databases.


Foods ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 1803
Author(s):  
Lisa-Carina Class ◽  
Gesine Kuhnen ◽  
Sascha Rohn ◽  
Jürgen Kuballa

Deep learning is a trending field in bioinformatics; so far, mostly known for image processing and speech recognition, but it also shows promising possibilities for data processing in food analysis, especially, foodomics. Thus, more and more deep learning approaches are used. This review presents an introduction into deep learning in the context of metabolomics and proteomics, focusing on the prediction of shelf-life, food authenticity, and food quality. Apart from the direct food-related applications, this review summarizes deep learning for peptide sequencing and its context to food analysis. The review’s focus further lays on MS (mass spectrometry)-based approaches. As a result of the constant development and improvement of analytical devices, as well as more complex holistic research questions, especially with the diverse and complex matrix food, there is a need for more effective methods for data processing. Deep learning might offer meeting this need and gives prospect to deal with the vast amount and complexity of data.


PROTEOMICS ◽  
2021 ◽  
pp. 2000319
Author(s):  
Monika Svecla ◽  
Giulia Garrone ◽  
Fiorenza Faré ◽  
Giacomo Aletti ◽  
Giuseppe Danilo Norata ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document