scholarly journals ionbot: a novel, innovative and sensitive machine learning approach to LC-MS/MS peptide identification

Author(s):  
Sven Degroeve ◽  
Ralf Gabriels ◽  
Kevin Velghe ◽  
Robbin Bouwmeester ◽  
Natalia Tichshenko ◽  
...  

Abstract Mass spectrometry-based proteomics generates vast amounts of signal data that require computational interpretation to obtain peptide identifications. Dozens of algorithms for this task exist, but all exploit only part of the acquired data to judge a peptide-to-spectrum match (PSM), ignoring important information such as the observed retention time and fragment ion peak intensity pattern. Moreover, only few identification algorithms allow open modification searches that can substantially increase peptide identifications. We here therefore introduce ionbot, a novel open modification search engine that is the first to fully merge machine learning with peptide identification. This core innovation brings the ability to include a much larger range of experimental data into PSM scoring, and even to adapt this scoring to the specifics of the data itself. As a result, ionbot substantially increases PSM confidence for open searches, and even enables a further increase in peptide identification rate of up to 30% by also considering highly plausible, lower-ranked, co-eluting matches for a fragmentation spectrum. Moreover, the exclusive use of machine learning for scoring also means that any future improvements to predictive models for peptide behavior will also result in more sensitive and accurate peptide identification.

2021 ◽  
Author(s):  
Sven Degroeve ◽  
Ralf Gabriels ◽  
Kevin Velghe ◽  
Robbin Bouwmeester ◽  
Natalia Tichshenko ◽  
...  

Mass spectrometry-based proteomics generates vast amounts of signal data that require computational interpretation to obtain peptide identifications. Dozens of algorithms for this task exist, but all exploit only part of the acquired data to judge a peptide-to-spectrum match (PSM), ignoring important information such as the observed retention time and fragment ion peak intensity pattern. Moreover, only few identification algorithms allow open modification searches that can substantially increase peptide identifications. We here therefore introduce ionbot, a novel open modification search engine that is the first to fully merge machine learning with peptide identification. This core innovation brings the ability to include a much larger range of experimental data into PSM scoring, and even to adapt this scoring to the specifics of the data itself. As a result, ionbot substantially increases PSM confidence for open searches, and even enables a further increase in peptide identification rate of up to 30% by also considering highly plausible, lower-ranked, co-eluting matches for a fragmentation spectrum. Moreover, the exclusive use of machine learning for scoring also means that any future improvements to predictive models for peptide behavior will also result in more sensitive and accurate peptide identification.


2021 ◽  
Author(s):  
Md. Zahangir Alam ◽  
Albino Simonetti ◽  
Rafaelle Billantino ◽  
Nick Tayler ◽  
Chris Grainge ◽  
...  

Providing proper timely treatment of asthma, self-monitoring can play a vital role in disease control. Existing methods (such as peak flow meter, smart spirometer) requires special equipment and are not always used by the patient. Using voice recording as surrogate measures of lung function can be used to assess asthma, which has good potential to self-monitor asthma and could be integrated into telehealth platforms. This study aims to apply machine learning approach to predict lung functions from recorded voice for asthma patients. A threshold-based mechanism was designed to separate speech and breathing from recordings (323 recordings from 26 participants) and features extracted from these were combined with biological attributes and lung function (percentage predicted forced expiratory volume in 1 second, FEV1%). Three predictive models were developed: (a) regression models to predict lung function, (b) multi-class classification models to predict the severity, and (c) binary classification models to predict abnormality. Random Forest (RF), Support Vector Machine (SVM), and Linear Regression (LR) algorithms were implemented to develop these predictive models. Training and test samples were separated (70%:30% using balanced portioning). Features were normalised and 10-fold cross-validation used to measure the model's training performances on the training samples. Models were then run on the test samples to measure the final performances. The RF based regression model performed better with lowest root mean square error = 10.86, and mean absolute score = 11.47, as compared to other models. In predicting the severity of lung function, the SVM based model performed better with 73.20% accuracy. The RF based model performed better in binary classification models for predicting abnormality of lung function (accuracy = 0.85, F1-score = 0.84, and area under the receiver operating characteristic curve = 0.88). The proposed machine learning approach can predict lung function (in terms of FEV1%), from the recorded voice files, better than other published approaches. These models can be extended to predict both the severity and abnormality of lung function with reasonable accuracies. This technique could be used to develop future telehealth solutions including smartphone-based applications which have potential to aid decision making and self-monitoring in asthma.


2019 ◽  
Vol 19 (11) ◽  
pp. 1772-1781 ◽  
Author(s):  
Summer S. Han ◽  
Tej D. Azad ◽  
Paola A. Suarez ◽  
John K. Ratliff

2021 ◽  
Author(s):  
Timothy I. Anderson ◽  
Yunan Li ◽  
Anthony R. Kovscek

Abstract Heavy oil resources are becoming increasingly important for the global oil supply, and consequently there has been renewed interest in techniques for extracting heavy oil. Among these, in-situ combustion (ISC) has tremendous potential for late-stage heavy oil fields, as well as high viscosity, very deep, or other unconventional reservoirs. A critical step in evaluating the use of ISC in a potential project is developing an accurate chemical reaction model to employ for larger-scale simulations. Such models can be difficult to calibrate, however, that in turn can lead to large errors in upscaled simulations. Data-driven models of ISC kinetics overcome these issues by foregoing the calibration step and predicting kinetics directly from laboratory data. In this work, we introduce the Non-Arrhenius Machine Learning Approach (NAMLA). NAMLA is a machine learning-based method for predicting O2 consumption in heavy oil combustion directly from ramped temperature oxidation (RTO) experimental data. Our model treats the O2 consumption as a function of only temperature and total O2 conversion and uses a locally-weighted linear regression model to predict the conversion rate at a query point. We apply this method to simulated and experimental data from heavy oil samples and compare its ability to predict O2 consumption curves with a previously proposed interpolation-based method. Results show that the presented method has better performance than previously proposed interpolation models when the available experimental data is very sparse or the query point lies outside the range of RTO experiments in the dataset. When available data is sufficiently dense or the query point is within the range of RTO curves in the training set, then linear interpolation has comparable or better accuracy than the proposed method. The biggest advantage of the proposed method is that it is able to compute confidence intervals for experimentally measured or estimated O2 consumption curves. We believe that future methods will be able to use the efficiency and accuracy of interpolation-based methods with the statistical properties of the proposed machine learning approach to better characterize and predict heavy oil combustion.


Author(s):  
Giulia Lorenzoni ◽  
Nicolò Sella ◽  
Annalisa Boscolo ◽  
Danila Azzolina ◽  
Patrizia Bartolotta ◽  
...  

Abstract Background Since the beginning of coronavirus disease 2019 (COVID-19), the development of predictive models has sparked relevant interest due to the initial lack of knowledge about diagnosis, treatment, and prognosis. The present study aimed at developing a model, through a machine learning approach, to predict intensive care unit (ICU) mortality in COVID-19 patients based on predefined clinical parameters. Results Observational multicenter cohort study. All COVID-19 adult patients admitted to 25 ICUs belonging to the VENETO ICU network (February 28th 2020-april 4th 2021) were enrolled. Patients admitted to the ICUs before 4th March 2021 were used for model training (“training set”), while patients admitted after the 5th of March 2021 were used for external validation (“test set 1”). A further group of patients (“test set 2”), admitted to the ICU of IRCCS Ca’ Granda Ospedale Maggiore Policlinico of Milan, was used for external validation. A SuperLearner machine learning algorithm was applied for model development, and both internal and external validation was performed. Clinical variables available for the model were (i) age, gender, sequential organ failure assessment score, Charlson Comorbidity Index score (not adjusted for age), Palliative Performance Score; (ii) need of invasive mechanical ventilation, non-invasive mechanical ventilation, O2 therapy, vasoactive agents, extracorporeal membrane oxygenation, continuous venous-venous hemofiltration, tracheostomy, re-intubation, prone position during ICU stay; and (iii) re-admission in ICU. One thousand two hundred ninety-three (80%) patients were included in the “training set”, while 124 (8%) and 199 (12%) patients were included in the “test set 1” and “test set 2,” respectively. Three different predictive models were developed. Each model included different sets of clinical variables. The three models showed similar predictive performances, with a training balanced accuracy that ranged between 0.72 and 0.90, while the cross-validation performance ranged from 0.75 to 0.85. Age was the leading predictor for all the considered models. Conclusions Our study provides a useful and reliable tool, through a machine learning approach, for predicting ICU mortality in COVID-19 patients. In all the estimated models, age was the variable showing the most important impact on mortality.


Sign in / Sign up

Export Citation Format

Share Document