Machine Learning for Mass Spectrometry Data Analysis in Proteomics

: With the rapid development of high-throughput techniques, mass spectrometry has been widely used for largescale protein analysis. To search for the existing proteins, discover biomarkers, and diagnose and prognose diseases, machine learning methods are applied in mass spectrometry data analysis. This paper reviews the applications of five kinds of machine learning methods to mass spectrometry data analysis from an algorithmic point of view, including support vector machine, decision tree, random forest, naive Bayesian classifier and deep learning.

Download Full-text

Machine Learning Applications for Mass Spectrometry-Based Metabolomics

Metabolites ◽

10.3390/metabo10060243 ◽

2020 ◽

Vol 10 (6) ◽

pp. 243 ◽

Cited By ~ 7

Author(s):

Ulf W. Liebal ◽

An N. T. Phan ◽

Malvika Sudhakar ◽

Karthik Raman ◽

Lars M. Blank

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Data Analysis ◽

Metabolic Engineering ◽

Data Representation ◽

Heterogeneous Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

The metabolome of an organism depends on environmental factors and intracellular regulation and provides information about the physiological conditions. Metabolomics helps to understand disease progression in clinical settings or estimate metabolite overproduction for metabolic engineering. The most popular analytical metabolomics platform is mass spectrometry (MS). However, MS metabolome data analysis is complicated, since metabolites interact nonlinearly, and the data structures themselves are complex. Machine learning methods have become immensely popular for statistical analysis due to the inherent nonlinear data representation and the ability to process large and heterogeneous data rapidly. In this review, we address recent developments in using machine learning for processing MS spectra and show how machine learning generates new biological insights. In particular, supervised machine learning has great potential in metabolomics research because of the ability to supply quantitative predictions. We review here commonly used tools, such as random forest, support vector machines, artificial neural networks, and genetic algorithms. During processing steps, the supervised machine learning methods help peak picking, normalization, and missing data imputation. For knowledge-driven analysis, machine learning contributes to biomarker detection, classification and regression, biochemical pathway identification, and carbon flux determination. Of important relevance is the combination of different omics data to identify the contributions of the various regulatory levels. Our overview of the recent publications also highlights that data quality determines analysis quality, but also adds to the challenge of choosing the right model for the data. Machine learning methods applied to MS-based metabolomics ease data analysis and can support clinical decisions, guide metabolic engineering, and stimulate fundamental biological discoveries.

Download Full-text

Fast-forward solver for inhomogeneous media using machine learning methods: artificial neural network, support vector machine and fuzzy logic

Neural Computing and Applications ◽

10.1007/s00521-016-2694-9 ◽

2016 ◽

Vol 29 (12) ◽

pp. 1583-1591 ◽

Cited By ~ 4

Author(s):

Mohammad Abdolrazzaghi ◽

Soheil Hashemy ◽

Ali Abdolali

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Fuzzy Logic ◽

Inhomogeneous Media ◽

Support Vector ◽

Learning Methods ◽

Network Support ◽

Machine Learning Methods

Download Full-text

Prediction of Collapsibility of Loess of Construction Sites in Xining Based on Machine Learning Methods

10.21203/rs.3.rs-307514/v1 ◽

2021 ◽

Author(s):

Qifei Zhao ◽

Xiaojun Li ◽

Yunning Cao ◽

Zhikun Li ◽

Jixin Fan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Training Data ◽

Support Vector ◽

Engineering Practice ◽

Burial Depth ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

North East

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.

Download Full-text

Implementing Machine Learning Methods for Ballpoint Pen Ink Classification based on Mass Spectrometry Data: Toward a Forensic Application

2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE) ◽

10.1109/jcsse53117.2021.9493823 ◽

2021 ◽

Author(s):

Pirada Boonna ◽

Chawanya Chaiwan ◽

Somrudee Deepaisarn ◽

Nattapon Simanon ◽

Onrapak Reamtong ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Mass Spectrometry Data ◽

Forensic Application ◽

Learning Methods ◽

Machine Learning Methods ◽

Ballpoint Pen Ink

Download Full-text

Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based look-up table

Atmospheric Chemistry and Physics ◽

10.5194/acp-16-8181-2016 ◽

2016 ◽

Vol 16 (13) ◽

pp. 8181-8191 ◽

Cited By ~ 10

Author(s):

Jani Huttunen ◽

Harri Kokkola ◽

Tero Mielonen ◽

Mika Esa Juhani Mononen ◽

Antti Lipponen ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Support Vector ◽

Learning Methods ◽

Surface Solar Radiation ◽

Machine Learning Methods ◽

Look Up Table ◽

Non Linear

Abstract. In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.

Download Full-text

Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, nonlinear regression and a radiative transfer based look-up table

10.5194/acp-2016-58 ◽

2016 ◽

Author(s):

J. Huttunen ◽

H. Kokkola ◽

T. Mielonen ◽

M. Mononen ◽

A. Lipponen ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Nonlinear Regression ◽

Support Vector ◽

Learning Methods ◽

Surface Solar Radiation ◽

Machine Learning Methods ◽

Look Up Table

Abstract. In order to have a good estimate of the current forcing by anthropogenic aerosols knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from 1990’s onward. One option to lengthen the AOD time series beyond 1990’s is to retrieve AOD from surface solar radiation (SSR) measurements done with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a nonlinear regression method and four machine learning methods (Gaussian Process, Neural Network, Random Forest and Support Vector Machine) with AOD observations done with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and nonlinear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the Random Forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, Neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval where as the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.

Download Full-text

Comparative Study on Theoretical and Machine Learning Methods for Acquiring Compressed Liquid Densities of 1,1,1,2,3,3,3-Heptafluoropropane (R227ea) via Song and Mason Equation, Support Vector Machine, and Artificial Neural Networks

Applied Sciences ◽

10.3390/app6010025 ◽

2016 ◽

Vol 6 (1) ◽

pp. 25 ◽

Cited By ~ 18

Author(s):

Hao Li ◽

Xindong Tang ◽

Run Wang ◽

Fan Lin ◽

Zhijian Liu ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machine ◽

Artificial Neural Networks ◽

Comparative Study ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Compressed Liquid ◽

Artificial Neural

Download Full-text

Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: decision tree,k-nearest neighbors, and support vector machine

SIMULATION ◽

10.1177/0037549716666962 ◽

2016 ◽

Vol 92 (9) ◽

pp. 861-871 ◽

Cited By ~ 54

Author(s):

Yudong Zhang ◽

Siyuan Lu ◽

Xingxing Zhou ◽

Ming Yang ◽

Lenan Wu ◽

...

Keyword(s):

Machine Learning ◽

Multiple Sclerosis ◽

Support Vector Machine ◽

Decision Tree ◽

Nearest Neighbors ◽

Support Vector ◽

Learning Methods ◽

K Nearest Neighbors ◽

Wavelet Entropy ◽

Machine Learning Methods

Download Full-text

Design of an Intelligent Variable-Flow Recirculating Aquaculture System Based on Machine Learning Methods

Applied Sciences ◽

10.3390/app11146546 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6546

Author(s):

Fudi Chen ◽

Yishuai Du ◽

Tianlong Qiu ◽

Zhe Xu ◽

Li Zhou ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Support Vector ◽

Recirculating Aquaculture System ◽

Learning Methods ◽

Variable Flow ◽

Machine Learning Methods ◽

Recirculating Aquaculture ◽

Aquaculture System ◽

Regulation Model

A recirculating aquaculture system (RAS) can reduce water and land requirements for intensive aquaculture production. However, a traditional RAS uses a fixed circulation flow rate for water treatment. In general, the water in an RAS is highly turbid only when the animals are fed and when they excrete. Therefore, RAS water quality regulation technology based on process control is proposed in this paper. The intelligent variable-flow RAS was designed based on the circulating pump-drum filter linkage working model. Machine learning methods were introduced to develop the intelligent regulation model to maintain a clean and stable water environment. Results showed that the long short-term memory network performed with the highest accuracy (training set 100%, test set 96.84%) and F1-score (training 100%, test 93.83%) among artificial neural networks. Optimization methods including grid search, cuckoo search, linear squares, and gene algorithm were proposed to improve the classification ability of support vector machine models. Results showed that all support vector machine models passed cross-validation and could meet accuracy standards. In summary, the gene algorithm support vector machine model (accuracy: training 100%, test 98.95%; F1-score: training 100%, test 99.17%) is suitable as an optimal variable-flow regulation model for an intelligent variable-flow RAS.

Download Full-text

Housing Value Forecasting Based on Machine Learning Methods

Abstract and Applied Analysis ◽

10.1155/2014/648047 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 8

Author(s):

Jingyi Mu ◽

Fang Wu ◽

Aihua Zhang

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Least Squares ◽

Optimal Solution ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

The Government

In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing the real estate on corresponding regions or not. In this paper, support vector machine (SVM), least squares support vector machine (LSSVM), and partial least squares (PLS) methods are used to forecast the home values. And these algorithms are compared according to the predicted results. Experiment shows that although the data set exists serious nonlinearity, the experiment result also show SVM and LSSVM methods are superior to PLS on dealing with the problem of nonlinearity. The global optimal solution can be found and best forecasting effect can be achieved by SVM because of solving a quadratic programming problem. In this paper, the different computation efficiencies of the algorithms are compared according to the computing times of relevant algorithms.

Download Full-text