scholarly journals Boosting MS1-only Proteomics with Machine Learning Allows 2000 Protein Identifications in Single-Shot Human Proteome Analysis Using 5 min HPLC Gradient

Author(s):  
Mark V. Ivanov ◽  
Julia A. Bubis ◽  
Vladimir Gorshkov ◽  
Daniil A. Abdrakhimov ◽  
Frank Kjeldsen ◽  
...  
2019 ◽  
Vol 9 (6) ◽  
pp. 1128 ◽  
Author(s):  
Yundong Li ◽  
Wei Hu ◽  
Han Dong ◽  
Xueyan Zhang

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.


Author(s):  
Emanuele Polino ◽  
Alessandro Lumino ◽  
Adil Syed Rab ◽  
Giorgio Milani ◽  
Nicolò Spagnolo ◽  
...  

Open Biology ◽  
2013 ◽  
Vol 3 (2) ◽  
pp. 120148 ◽  
Author(s):  
John Y. Ng ◽  
Lies Boelen ◽  
Jason W. H. Wong

Protein 3-nitrotyrosine is a post-translational modification that commonly arises from the nitration of tyrosine residues. This modification has been detected under a wide range of pathological conditions and has been shown to alter protein function. Whether 3-nitrotyrosine is important in normal cellular processes or is likely to affect specific biological pathways remains unclear. Using GPS-YNO2, a recently described 3-nitrotyrosine prediction algorithm, a set of predictions for nitrated residues in the human proteome was generated. In total, 9.27 per cent of the proteome was predicted to be nitratable (27 922/301 091). By matching the predictions against a set of curated and experimentally validated 3-nitrotyrosine sites in human proteins, it was found that GPS-YNO2 is able to predict 73.1 per cent (404/553) of these sites. Furthermore, of these sites, 42 have been shown to be nitrated endogenously, with 85.7 per cent (36/42) of these predicted to be nitrated. This demonstrates the feasibility of using the predicted dataset for a whole proteome analysis. A comprehensive bioinformatics analysis was subsequently performed on predicted and all experimentally validated nitrated tyrosine. This found mild but specific biophysical constraints that affect the susceptibility of tyrosine to nitration, and these may play a role in increasing the likelihood of 3-nitrotyrosine to affect processes, including phosphorylation and DNA binding. Furthermore, examining the evolutionary conservation of predicted 3-nitrotyrosine showed that, relative to non-nitrated tyrosine residues, 3-nitrotyrosine residues are generally less conserved. This suggests that, at least in the majority of cases, 3-nitrotyrosine is likely to have a deleterious effect on protein function and less likely to be important in normal cellular function.


2013 ◽  
Vol 12 (1) ◽  
pp. 97-105 ◽  
Author(s):  
Kyung-Hoon Kwon ◽  
Jin Young Kim ◽  
Se-Young Kim ◽  
Hye Kyeong Min ◽  
Hyoung-Joo Lee ◽  
...  

2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Takeru Kusumoto ◽  
Kosuke Mitarai ◽  
Keisuke Fujii ◽  
Masahiro Kitagawa ◽  
Makoto Negoro

AbstractThe kernel trick allows us to employ high-dimensional feature space for a machine learning task without explicitly storing features. Recently, the idea of utilizing quantum systems for computing kernel functions using interference has been demonstrated experimentally. However, the dimension of feature spaces in those experiments have been smaller than the number of data, which makes them lose their computational advantage over explicit method. Here we show the first experimental demonstration of a quantum kernel machine that achieves a scheme where the dimension of feature space greatly exceeds the number of data using 1H nuclear spins in solid. The use of NMR allows us to obtain the kernel values with single-shot experiment. We employ engineered dynamics correlating 25 spins which is equivalent to using a feature space with a dimension over 1015. This work presents a quantum machine learning using one of the largest quantum systems to date.


2020 ◽  
Author(s):  
Mark V. Ivanov ◽  
Julia A. Bubis ◽  
Vladimir Gorshkov ◽  
Daniil A. Abdrakhimov ◽  
Frank Kjeldsen ◽  
...  

ABSTRACTProteome-wide analyses most often rely on tandem mass spectrometry imposing considerable instrumental time consumption that is one of the main obstacles in a broader acceptance of proteomics in biomedical and clinical research. Recently, we presented a fast proteomic method termed DirectMS1 based on MS1-only mass spectra acquisition and data processing. The method allowed significant squeezing of the proteome-wide analysis to a few minute time frame at the depth of quantitative proteome coverage of 1000 proteins at 1% FDR. In this work, to further increase the capabilities of the DirectMS1 method, we explored the opportunities presented by the recent progress in the machine learning area and applied the LightGBM tree-based learning algorithm into the scoring of peptide-feature matches when processing MS1 spectra. Further, we integrated the peptide feature identification algorithm of DirectMS1 with the recently introduced peptide retention time prediction utility, DeepLC. Additional approaches to improve performance of the DirectMS1 method are discussed and demonstrated, such as FAIMS coupled to the Orbitrap mass analyzer. As a result of all improvements to DirectMS1, we succeeded in identifying more than 2000 proteins at 1% FDR from the HeLa cell line in a 5 minute LC-MS1 analysis.


2019 ◽  
Author(s):  
Jarrod J Sandow ◽  
Giuseppe Infusini ◽  
Laura F Dagley ◽  
Rune Larsen ◽  
Andrew I Webb

AbstractRecent advances in mass spectrometry technology have seen remarkable increases in proteomic sequencing speed, while improvements to dynamic range have remained limited. An exemplar of this is the new timsTOF Pro instrument, which thanks to its trapped ion mobility, pushes effective fragmentation rates beyond 100Hz and provides accurate CCS values as well as impressive sensitivity. Established data dependent methodologies underutilize these advances by relying on long analytical columns and extended LC gradients to achieve comprehensive proteome coverage from biological samples. Here we describe the implementation of methods for short packed emitter columns that fully utilize instrument speed and CCS values by combining rapid generation of deep peptide libraries with enhanced matching of single shot data dependent sample analysis. Impressively, with only a 17 minute gradient separation (50 samples per day), the combination of high performance chromatography and CCS enhanced library based matching resulted in an average of 6,690 protein identifications within individual samples, and 7,797 proteins cumulatively across replicates from HeLa cell tryptic digests. Additionally, an ultra-high throughput setup utilizing 5 min gradients (180 samples per day) yielded an average of 2,800 protein identifications within individual samples and 4,254 proteins cumulatively across replicates. These workflows are simple to implement on available technology and do not require complex software solutions or custom made consumables to achieve high throughput and deep proteome analysis from biological samples.


2015 ◽  
Vol 31 (9) ◽  
pp. 1411-1419 ◽  
Author(s):  
Fuyi Li ◽  
Chen Li ◽  
Mingjun Wang ◽  
Geoffrey I. Webb ◽  
Yang Zhang ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zhe Zhang ◽  
Xi Yang ◽  
Xiaobiao Huang ◽  
Junjie Li ◽  
Timur Shaftan ◽  
...  

AbstractTo harness the full potential of the ultrafast electron diffraction (UED) and microscopy (UEM), we must know accurately the electron beam properties, such as emittance, energy spread, spatial-pointing jitter, and shot-to-shot energy fluctuation. Owing to the inherent fluctuations in UED/UEM instruments, obtaining such detailed knowledge requires real-time characterization of the beam properties for each electron bunch. While diagnostics of these properties exist, they are often invasive, and many of them cannot operate at a high repetition rate. Here, we present a technique to overcome such limitations. Employing a machine learning (ML) strategy, we can accurately predict electron beam properties for every shot using only parameters that are easily recorded at high repetition rate by the detector while the experiments are ongoing, by training a model on a small set of fully diagnosed bunches. Applying ML as real-time noninvasive diagnostics could enable some new capabilities, e.g., online optimization of the long-term stability and fine single-shot quality of the electron beam, filtering the events and making online corrections of the data for time-resolved UED, otherwise impossible. This opens the possibility of fully realizing the potential of high repetition rate UED and UEM for life science and condensed matter physics applications.


2021 ◽  
Author(s):  
Abhishek Sengupta ◽  
G. Naresh ◽  
Astha Mishra ◽  
Diksha Parashar ◽  
Priyanka Narad

Sign in / Sign up

Export Citation Format

Share Document