Machine Learning for Mass Spectrometry Data Analysis in Proteomics

2020 ◽  
Vol 17 ◽  
Author(s):  
Juntao Li ◽  
Kanglei Zhou ◽  
Bingyu Mu

: With the rapid development of high-throughput techniques, mass spectrometry has been widely used for largescale protein analysis. To search for the existing proteins, discover biomarkers, and diagnose and prognose diseases, machine learning methods are applied in mass spectrometry data analysis. This paper reviews the applications of five kinds of machine learning methods to mass spectrometry data analysis from an algorithmic point of view, including support vector machine, decision tree, random forest, naive Bayesian classifier and deep learning.

Metabolites ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 243 ◽  
Author(s):  
Ulf W. Liebal ◽  
An N. T. Phan ◽  
Malvika Sudhakar ◽  
Karthik Raman ◽  
Lars M. Blank

The metabolome of an organism depends on environmental factors and intracellular regulation and provides information about the physiological conditions. Metabolomics helps to understand disease progression in clinical settings or estimate metabolite overproduction for metabolic engineering. The most popular analytical metabolomics platform is mass spectrometry (MS). However, MS metabolome data analysis is complicated, since metabolites interact nonlinearly, and the data structures themselves are complex. Machine learning methods have become immensely popular for statistical analysis due to the inherent nonlinear data representation and the ability to process large and heterogeneous data rapidly. In this review, we address recent developments in using machine learning for processing MS spectra and show how machine learning generates new biological insights. In particular, supervised machine learning has great potential in metabolomics research because of the ability to supply quantitative predictions. We review here commonly used tools, such as random forest, support vector machines, artificial neural networks, and genetic algorithms. During processing steps, the supervised machine learning methods help peak picking, normalization, and missing data imputation. For knowledge-driven analysis, machine learning contributes to biomarker detection, classification and regression, biochemical pathway identification, and carbon flux determination. Of important relevance is the combination of different omics data to identify the contributions of the various regulatory levels. Our overview of the recent publications also highlights that data quality determines analysis quality, but also adds to the challenge of choosing the right model for the data. Machine learning methods applied to MS-based metabolomics ease data analysis and can support clinical decisions, guide metabolic engineering, and stimulate fundamental biological discoveries.


2021 ◽  
Author(s):  
Qifei Zhao ◽  
Xiaojun Li ◽  
Yunning Cao ◽  
Zhikun Li ◽  
Jixin Fan

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.


2016 ◽  
Vol 16 (13) ◽  
pp. 8181-8191 ◽  
Author(s):  
Jani Huttunen ◽  
Harri Kokkola ◽  
Tero Mielonen ◽  
Mika Esa Juhani Mononen ◽  
Antti Lipponen ◽  
...  

Abstract. In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.


2016 ◽  
Author(s):  
J. Huttunen ◽  
H. Kokkola ◽  
T. Mielonen ◽  
M. Mononen ◽  
A. Lipponen ◽  
...  

Abstract. In order to have a good estimate of the current forcing by anthropogenic aerosols knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from 1990’s onward. One option to lengthen the AOD time series beyond 1990’s is to retrieve AOD from surface solar radiation (SSR) measurements done with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a nonlinear regression method and four machine learning methods (Gaussian Process, Neural Network, Random Forest and Support Vector Machine) with AOD observations done with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and nonlinear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the Random Forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, Neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval where as the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.


2021 ◽  
Vol 11 (14) ◽  
pp. 6546
Author(s):  
Fudi Chen ◽  
Yishuai Du ◽  
Tianlong Qiu ◽  
Zhe Xu ◽  
Li Zhou ◽  
...  

A recirculating aquaculture system (RAS) can reduce water and land requirements for intensive aquaculture production. However, a traditional RAS uses a fixed circulation flow rate for water treatment. In general, the water in an RAS is highly turbid only when the animals are fed and when they excrete. Therefore, RAS water quality regulation technology based on process control is proposed in this paper. The intelligent variable-flow RAS was designed based on the circulating pump-drum filter linkage working model. Machine learning methods were introduced to develop the intelligent regulation model to maintain a clean and stable water environment. Results showed that the long short-term memory network performed with the highest accuracy (training set 100%, test set 96.84%) and F1-score (training 100%, test 93.83%) among artificial neural networks. Optimization methods including grid search, cuckoo search, linear squares, and gene algorithm were proposed to improve the classification ability of support vector machine models. Results showed that all support vector machine models passed cross-validation and could meet accuracy standards. In summary, the gene algorithm support vector machine model (accuracy: training 100%, test 98.95%; F1-score: training 100%, test 99.17%) is suitable as an optimal variable-flow regulation model for an intelligent variable-flow RAS.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Jingyi Mu ◽  
Fang Wu ◽  
Aihua Zhang

In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing the real estate on corresponding regions or not. In this paper, support vector machine (SVM), least squares support vector machine (LSSVM), and partial least squares (PLS) methods are used to forecast the home values. And these algorithms are compared according to the predicted results. Experiment shows that although the data set exists serious nonlinearity, the experiment result also show SVM and LSSVM methods are superior to PLS on dealing with the problem of nonlinearity. The global optimal solution can be found and best forecasting effect can be achieved by SVM because of solving a quadratic programming problem. In this paper, the different computation efficiencies of the algorithms are compared according to the computing times of relevant algorithms.


Sign in / Sign up

Export Citation Format

Share Document