Machine Learning for Ionic Liquid Toxicity Prediction

In addition to proper physicochemical properties, low toxicity is also desirable when seeking suitable ionic liquids (ILs) for specific applications. In this context, machine learning (ML) models were developed to predict the IL toxicity in leukemia rat cell line (IPC-81) based on an extended experimental dataset. Following a systematic procedure including framework construction, hyper-parameter optimization, model training, and evaluation, the feedforward neural network (FNN) and support vector machine (SVM) algorithms were adopted to predict the toxicity of ILs directly from their molecular structures. Based on the ML structures optimized by the five-fold cross validation, two ML models were established and evaluated using IL structural descriptors as inputs. It was observed that both models exhibited high predictive accuracy, with the SVM model observed to be slightly better than the FNN model. For the SVM model, the determination coefficients were 0.9289 and 0.9202 for the training and test sets, respectively. The satisfactory predictive performance and generalization ability make our models useful for the computer-aided molecular design (CAMD) of environmentally friendly ILs.

Download Full-text

A Support Vector Machine Model with Hyperparameters Optimised by Mind Evolutionary Algorithm for Assessing Permeability of Rock

Advances in Civil Engineering ◽

10.1155/2020/4718493 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Wenjin Zhu ◽

Zhiming Chao ◽

Guotao Ma

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Evolutionary Algorithm ◽

Predictive Accuracy ◽

Support Vector ◽

Particle Swarm Algorithm ◽

Learning Models ◽

Machine Model ◽

Svm Model ◽

Machine Learning Models

In this paper, a database developed from the existing literature about permeability of rock was established. Based on the constructed database, a Support Vector Machine (SVM) model with hyperparameters optimised by Mind Evolutionary Algorithm (MEA) was proposed to predict the permeability of rock. Meanwhile, the Genetic Algorithm- (GA-) and Particle Swarm Algorithm- (PSO-) SVM models were constructed to compare the improving effects of MEA on the foretelling accuracy of machine learning models with those of GA and PSO, respectively. The following conclusions were drawn. MEA can increase the predictive accuracy of the constructed machine learning models remarkably in a few iteration times, which has better optimisation performance than that of GA and PSO. MEA-SVM has the best forecasting performance, followed by PSO-SVM, while the estimating precision of GA-SVM is lower than them. The proposed MEA-SVM model can accurately predict the permeability of rock indicating the model having a satisfactory generalization and extrapolation capacity.

Download Full-text

An Interpretable Machine Learning Model for Daily Global Solar Radiation Prediction

Energies ◽

10.3390/en14217367 ◽

2021 ◽

Vol 14 (21) ◽

pp. 7367

Author(s):

Mohamed Chaibi ◽

EL Mahjoub Benghoulam ◽

Lhoussaine Tarik ◽

Mohamed Berrada ◽

Abdellah El Hmaidi

Keyword(s):

Machine Learning ◽

Solar Radiation ◽

Predictive Accuracy ◽

Sunshine Duration ◽

Predictive Performance ◽

Global Solar Radiation ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Testing Stage

Machine learning (ML) models are commonly used in solar modeling due to their high predictive accuracy. However, the predictions of these models are difficult to explain and trust. This paper aims to demonstrate the utility of two interpretation techniques to explain and improve the predictions of ML models. We compared first the predictive performance of Light Gradient Boosting (LightGBM) with three benchmark models, including multilayer perceptron (MLP), multiple linear regression (MLR), and support-vector regression (SVR), for estimating the global solar radiation (H) in the city of Fez, Morocco. Then, the predictions of the most accurate model were explained by two model-agnostic explanation techniques: permutation feature importance (PFI) and Shapley additive explanations (SHAP). The results indicated that LightGBM (R2 = 0.9377, RMSE = 0.4827 kWh/m2, MAE = 0.3614 kWh/m2) provides similar predictive accuracy as SVR, and outperformed MLP and MLR in the testing stage. Both PFI and SHAP methods showed that extraterrestrial solar radiation (H0) and sunshine duration fraction (SF) are the two most important parameters that affect H estimation. Moreover, the SHAP method established how each feature influences the LightGBM estimations. The predictive accuracy of the LightGBM model was further improved slightly after re-examination of features, where the model combining H0, SF, and RH was better than the model with all features.

Download Full-text

NLOS Multipath Classification of GNSS Signal Correlation Output Using Machine Learning

Sensors ◽

10.3390/s21072503 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2503

Author(s):

Taro Suzuki ◽

Yoshiharu Amano

Keyword(s):

Machine Learning ◽

Satellite System ◽

Training Data ◽

Support Vector ◽

Positioning Errors ◽

Automated Method ◽

Global Navigation Satellite ◽

Better Than ◽

Signal Correlation

This paper proposes a method for detecting non-line-of-sight (NLOS) multipath, which causes large positioning errors in a global navigation satellite system (GNSS). We use GNSS signal correlation output, which is the most primitive GNSS signal processing output, to detect NLOS multipath based on machine learning. The shape of the multi-correlator outputs is distorted due to the NLOS multipath. The features of the shape of the multi-correlator are used to discriminate the NLOS multipath. We implement two supervised learning methods, a support vector machine (SVM) and a neural network (NN), and compare their performance. In addition, we also propose an automated method of collecting training data for LOS and NLOS signals of machine learning. The evaluation of the proposed NLOS detection method in an urban environment confirmed that NN was better than SVM, and 97.7% of NLOS signals were correctly discriminated.

Download Full-text

Civil Aeroengine Fault Diagnosis Based on Fuzzy Least Square Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.130-134.2047 ◽

2011 ◽

Vol 130-134 ◽

pp. 2047-2050 ◽

Cited By ~ 1

Author(s):

Hong Chun Qu ◽

Xie Bin Ding

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Support Vector Machine ◽

Fault Diagnosis ◽

Coefficient Matrix ◽

Least Square ◽

Support Vector ◽

Influence Coefficient ◽

Structural Risk ◽

Better Than

SVM(Support Vector Machine) is a new artificial intelligence methodolgy, basing on structural risk mininization principle, which has better generalization than the traditional machine learning and SVM shows powerfulability in learning with limited samples. To solve the problem of lack of engine fault samples, FLS-SVM theory, an improved SVM, which is a method is applied. 10 common engine faults are trained and recognized in the paper.The simulated datas are generated from PW4000-94 engine influence coefficient matrix at cruise, and the results show that the diagnostic accuracy of FLS-SVM is better than LS-SVM.

Download Full-text

Remote sensing inversion of water quality in coastal sea area based on machine learning: a case study of Shenzhen bay, China

10.5194/egusphere-egu21-1972 ◽

2021 ◽

Author(s):

Xiaotong Zhu ◽

Jinhui Jeanne Huang

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Water Quality ◽

Predictive Accuracy ◽

Water Environment ◽

Quality Parameters ◽

Machine Learning Algorithms ◽

Dynamic Monitoring ◽

Support Vector ◽

Seawater Quality

Remote sensing monitoring has the characteristics of wide monitoring range, celerity, low cost for long-term dynamic monitoring of water environment. With the flourish of artificial intelligence, machine learning has enabled remote sensing inversion of seawater quality to achieve higher prediction accuracy. However, due to the physicochemical property of the water quality parameters, the performance of algorithms differs a lot. In order to improve the predictive accuracy of seawater quality parameters, we proposed a technical framework to identify the optimal machine learning algorithms using Sentinel-2 satellite and in-situ seawater sample data. In the study, we select three algorithms, i.e. support vector regression (SVR), XGBoost and deep learning (DL), and four seawater quality parameters, i.e. dissolved oxygen (DO), total dissolved solids (TDS), turbidity(TUR) and chlorophyll-a (Chla). The results show that SVR is a more precise algorithm to inverse DO (R2 = 0.81). XGBoost has the best accuracy for Chla and Tur inversion (R2 = 0.75 and 0.78 respectively) while DL performs better in TDS (R2 =0.789). Overall, this research provides a theoretical support for high precision remote sensing inversion of offshore seawater quality parameters based on machine learning.

Download Full-text

Application of machine learning in predicting construction project profit in Ghana using Support Vector Regression Algorithm (SVRA)

Engineering Construction & Architectural Management ◽

10.1108/ecam-08-2020-0618 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Emmanuel Adinyira ◽

Emmanuel Akoi-Gyebi Adjei ◽

Kofi Agyekum ◽

Frank Desmond Kofi Fugar

Keyword(s):

Machine Learning ◽

Support Vector Regression ◽

Cash Flow ◽

Predictive Accuracy ◽

Model Development ◽

Construction Project ◽

Support Vector ◽

Sensitivity Index ◽

Content Type ◽

Hyperparameter Selection

PurposeKnowledge of the effect of various cash-flow factors on expected project profit is important to effectively manage productivity on construction projects. This study was conducted to develop and test the sensitivity of a Machine Learning Support Vector Regression Algorithm (SVRA) to predict construction project profit in Ghana.Design/methodology/approachThe study relied on data from 150 institutional projects executed within the past five years (2014–2018) in developing the model. Eighty percent (80%) of the data from the 150 projects was used at hyperparameter selection and final training phases of the model development and the remaining 20% for model testing. Using MATLAB for Support Vector Regression, the parameters available for tuning were the epsilon values, the kernel scale, the box constraint and standardisations. The sensitivity index was computed to determine the degree to which the independent variables impact the dependent variable.FindingsThe developed model's predictions perfectly fitted the data and explained all the variability of the response data around its mean. Average predictive accuracy of 73.66% was achieved with all the variables on the different projects in validation. The developed SVR model was sensitive to labour and loan.Originality/valueThe developed SVRA combines variation, defective works and labour with other financial constraints, which have been the variables used in previous studies. It will aid contractors in predicting profit on completion at commencement and also provide information on the effect of changes to cash-flow factors on profit.

Download Full-text

A proof-of-concept study applying machine learning methods to putative risk factors for eating disorders: results from the multi-centre European project on healthy eating

Psychological Medicine ◽

10.1017/s003329172100489x ◽

2021 ◽

pp. 1-10

Author(s):

I. Krug ◽

J. Linardon ◽

C. Greenwood ◽

G. Youssef ◽

J. Treasure ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Predictive Accuracy ◽

Area Under The Curve ◽

Prediction Rule ◽

Predictive Performance ◽

Individual Risk ◽

European Project ◽

Wide Range

Abstract Background Despite a wide range of proposed risk factors and theoretical models, prediction of eating disorder (ED) onset remains poor. This study undertook the first comparison of two machine learning (ML) approaches [penalised logistic regression (LASSO), and prediction rule ensembles (PREs)] to conventional logistic regression (LR) models to enhance prediction of ED onset and differential ED diagnoses from a range of putative risk factors. Method Data were part of a European Project and comprised 1402 participants, 642 ED patients [52% with anorexia nervosa (AN) and 40% with bulimia nervosa (BN)] and 760 controls. The Cross-Cultural Risk Factor Questionnaire, which assesses retrospectively a range of sociocultural and psychological ED risk factors occurring before the age of 12 years (46 predictors in total), was used. Results All three statistical approaches had satisfactory model accuracy, with an average area under the curve (AUC) of 86% for predicting ED onset and 70% for predicting AN v. BN. Predictive performance was greatest for the two regression methods (LR and LASSO), although the PRE technique relied on fewer predictors with comparable accuracy. The individual risk factors differed depending on the outcome classification (EDs v. non-EDs and AN v. BN). Conclusions Even though the conventional LR performed comparably to the ML approaches in terms of predictive accuracy, the ML methods produced more parsimonious predictive models. ML approaches offer a viable way to modify screening practices for ED risk that balance accuracy against participant burden.

Download Full-text

Novel Genetic Variants of Hepatitis B Virus in Fulminant Hepatitis

Journal of Pathogens ◽

10.1155/2017/1231204 ◽

2017 ◽

Vol 2017 ◽

pp. 1-6 ◽

Cited By ~ 4

Author(s):

Jack Bee Chook ◽

Yun Fong Ngeow ◽

Kok Keng Tee ◽

Suat Cheng Peh ◽

Rosmawati Mohamed

Keyword(s):

Hepatitis B Virus ◽

Hepatitis B ◽

Predictive Value ◽

Fulminant Hepatitis ◽

Predictive Accuracy ◽

Stop Codon ◽

Acute Infection ◽

Support Vector ◽

Svm Model ◽

B Virus

Fulminant hepatitis (FH) is a life-threatening liver disease characterised by intense immune attack and massive liver cell death. The common precore stop codon mutation of hepatitis B virus (HBV), A1896, is frequently associated with FH, but lacks specificity. This study attempts to uncover all possible viral nucleotides that are specifically associated with FH through a compiled sequence analysis of FH and non-FH cases from acute infection. We retrieved 67 FH and 280 acute non-FH cases of hepatitis B from GenBank and applied support vector machine (SVM) model to seek candidate nucleotides highly predictive of FH. Six best candidates with top predictive accuracy, 92.5%, were used to build a SVM model; they are C2129 (85.3%), T720 (83.0%), Y2131 (82.4%), T2013 (82.1%), K2048 (82.1%), and A2512 (82.1%). This model gave a high specificity (99.3%), positive predictive value (95.6%), and negative predictive value (92.1%), but only moderate sensitivity (64.2%). We successfully built a SVM model comprising six variants that are highly predictive and specific for FH: four in the core region and one each in the polymerase and the surface regions. These variants indicate that intracellular virion/core retention could play an important role in the progression to FH.

Download Full-text

Predicting dengue importation into Europe, using machine learning and model-agnostic methods

10.1101/19013383 ◽

2019 ◽

Author(s):

Donald Salami ◽

Carla Alexandra Sousa ◽

Maria do Rosário Oliveira Martins ◽

César Capinha

Keyword(s):

Machine Learning ◽

Operating Characteristic ◽

Predictive Accuracy ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Transport Network ◽

Air Transport ◽

Health Concern ◽

Centrality Measures ◽

Network Centrality

ABSTRACTThe geographical spread of dengue is a global public health concern. This is largely mediated by the importation of dengue from endemic to non-endemic areas via the increasing connectivity of the global air transport network. The dynamic nature and intrinsic heterogeneity of the air transport network make it challenging to predict dengue importation.Here, we explore the capabilities of state-of-the-art machine learning algorithms to predict dengue importation. We trained four machine learning classifiers algorithms, using a 6-year historical dengue importation data for 21 countries in Europe and connectivity indices mediating importation and air transport network centrality measures. Predictive performance for the classifiers was evaluated using the area under the receiving operating characteristic curve, sensitivity, and specificity measures. Finally, we applied practical model-agnostic methods, to provide an in-depth explanation of our optimal model’s predictions on a global and local scale.Our best performing model achieved high predictive accuracy, with an area under the receiver operating characteristic score of 0.94 and a maximized sensitivity score of 0.88. The predictor variables identified as most important were the source country’s dengue incidence rate, population size, and volume of air passengers. Network centrality measures, describing the positioning of European countries within the air travel network, were also influential to the predictions.We demonstrated the high predictive performance of a machine learning model in predicting dengue importation and the utility of the model-agnostic methods to offer a comprehensive understanding of the reasons behind the predictions. Similar approaches can be utilized in the development of an operational early warning surveillance system for dengue importation.

Download Full-text

iBitter-Fuse: A Novel Sequence-Based Bitter Peptide Predictor by Fusing Multi-View Features

International Journal of Molecular Sciences ◽

10.3390/ijms22168958 ◽

2021 ◽

Vol 22 (16) ◽

pp. 8958

Author(s):

Phasit Charoenkwan ◽

Chanin Nantasenamat ◽

Md. Mehedi Hasan ◽

Mohammad Ali Moni ◽

Pietro Lio’ ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

De Novo ◽

Predictive Performance ◽

Support Vector ◽

Sufficient Information ◽

Self Assessment ◽

Accurate Identification ◽

Bitter Peptides ◽

Accurate Performance

Accurate identification of bitter peptides is of great importance for better understanding their biochemical and biophysical properties. To date, machine learning-based methods have become effective approaches for providing a good avenue for identifying potential bitter peptides from large-scale protein datasets. Although few machine learning-based predictors have been developed for identifying the bitterness of peptides, their prediction performances could be improved. In this study, we developed a new predictor (named iBitter-Fuse) for achieving more accurate identification of bitter peptides. In the proposed iBitter-Fuse, we have integrated a variety of feature encoding schemes for providing sufficient information from different aspects, namely consisting of compositional information and physicochemical properties. To enhance the predictive performance, the customized genetic algorithm utilizing self-assessment-report (GA-SAR) was employed for identifying informative features followed by inputting optimal ones into a support vector machine (SVM)-based classifier for developing the final model (iBitter-Fuse). Benchmarking experiments based on both 10-fold cross-validation and independent tests indicated that the iBitter-Fuse was able to achieve more accurate performance as compared to state-of-the-art methods. To facilitate the high-throughput identification of bitter peptides, the iBitter-Fuse web server was established and made freely available online. It is anticipated that the iBitter-Fuse will be a useful tool for aiding the discovery and de novo design of bitter peptides

Download Full-text