Group Contribution and Machine Learning Approaches to Predict Abraham Solute Parameters, Solvation Free Energy, and Solvation Enthalpy

10.33774/chemrxiv-2021-djd3d ◽

2021 ◽

Author(s):

Yunsie Chung ◽

Florence H. Vermeire ◽

Haoyang Wu ◽

Pierre J. Walker ◽

Michael H. Abraham ◽

...

Keyword(s):

Machine Learning ◽

Free Energy ◽

Prediction Models ◽

Solvation Energy ◽

Solvation Free Energy ◽

Learning Model ◽

Group Contribution ◽

Group Contribution Method ◽

Solvation Enthalpy ◽

Machine Learning Model

We present a group contribution method (SoluteGC) and a machine learning model (SoluteML) to predict the Abraham solute parameters, as well as a machine learning model (DirectML) to predict solvation free energy and enthalpy at 298 K. The proposed group contribution method uses atom-centered functional groups with corrections for ring and polycyclic strain whilst the machine learning models adopt a directed message passing neural network. The solute parameters predicted from SoluteGC and SoluteML are used to calculate solvation energy and enthalpy via linear free energy relationships. Extensive data sets containing 8366 solute parameters, 20253 solvation free energies, and 6322 solvation enthalpies are compiled in this work to train the models. The three models are each evaluated on the same test sets using both random and substructure-based solute splits for solvation energy and enthalpy predictions. The results show that the DirectML model is superior to the SoluteML and SoluteGC models for both predictions and can provide accuracy comparable to that of advanced quantum chemistry methods. Yet, even though the DirectML model performs better in general, all three models are useful for various purposes. Uncertain predicted values can be identified by comparing the 3 models, and when the 3 models are combined together, they can provide even more accurate predictions than any one of them individually. Finally, we present our compiled solute parameter, solvation energy, and solvation enthalpy databases (SoluteDB, dGsolvDBx, dHsolvDB) and provide public access to our final prediction models through a simple web-based tool, software package, and source code.

Get full-text (via PubEx)

Solvation Free Energy Prediction from Pairwise Atomistic Interactions by Machine Learning

10.21203/rs.3.rs-207945/v1 ◽

2021 ◽

Author(s):

Hyuntae Lim ◽

YounJoon Jung

Keyword(s):

Machine Learning ◽

Free Energy ◽

Chemical Properties ◽

Solvation Energy ◽

Solvation Free Energy ◽

Learning Technologies ◽

Training Data ◽

Inner Product ◽

Structure Property ◽

Energy Prediction

Abstract Recent advances in machine learning technologies and their applications have led to the development of diverse structure-property relationship models for crucial chemical properties. The solvation free energy is one of them. Here, we introduce a novel ML-based solvation model, which calculates the solvation energy from pairwise atomistic interactions. The novelty of the proposed model consists of a simple architecture: two encoding functions extract atomic feature vectors from the given chemical structure, while the inner product between the two atomistic features calculates their interactions. The results of 6,493 experimental measurements achieve outstanding performance and transferability for enlarging training data owing to its solvent-non-specific nature. An analysis of the interaction map shows that our model has significant potential for producing group contributions on the solvation energy, which indicates that the model provides provides not only predictions of target properties but also more detailed physicochemical insights.

Get full-text (via PubEx)

Covid-19 Analysis and Prediction using Data Science and Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39272 ◽

2021 ◽

Vol 9 (12) ◽

pp. 303-307

Author(s):

Akshata Kulkarni

Keyword(s):

Machine Learning ◽

Data Science ◽

Information Dissemination ◽

Prediction Models ◽

Learning Model ◽

Future Trend ◽

Control Measures ◽

Machine Learning Model ◽

Using Data ◽

Novel Coronavirus

Abstract: Officials around the world are using several COVID-19 outbreak prediction models to make educated decisions and enact necessary control measures. In this study, we developed a Machine Learning model which predicts and forecasts the COVID-19 outbreak in India, with the goal of determining the best regression model for an in-depth examination of the novel coronavirus. Based on data available from January 31 to October 31, 2020, collected from Kaggle, this model predicts the number of confirmed cases in Maharashtra. We're using a Machine Learning model to foresee the future trend of these situations. The project has the potential to demonstrate the importance of information dissemination in improving response time and planning ahead of time to help reduce risk.

Get full-text (via PubEx)

Quantitative Toxicity Prediction via Ensembling of Heterogeneous Predictors

10.21203/rs.2.19338/v1 ◽

2019 ◽

Author(s):

Abdul Karim ◽

Vahid Riahi ◽

Avinash Mishra ◽

Abdollah Dehzangi ◽

M. A. Hakim Newton ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Models ◽

Individual Performance ◽

Learning Model ◽

Data Representation ◽

Toxicity Prediction ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Benchmark Datasets

Abstract Representing molecules in the form of only one type of features and using those features to predict their activities is one of the most important approaches for machine-learning-based chemical-activity-prediction. For molecular activities like quantitative toxicity prediction, the performance depends on the type of features extracted and the machine learning approach used. For such cases, using one type of features and machine learning model restricts the prediction performance to specific representation and model used. In this paper, we study quantitative toxicity prediction and propose a machine learning model for the same. Our model uses an ensemble of heterogeneous predictors instead of typically using homogeneous predictors. The predictors that we use vary either on the type of features used or on the deep learning architecture employed. Each of these predictors presumably has its own strengths and weaknesses in terms of toxicity prediction. Our motivation is to make a combined model that utilizes different types of features and architectures to obtain better collective performance that could go beyond the performance of each individual predictor. We use six predictors in our model and test the model on four standard quantitative toxicity benchmark datasets. Experimental results show that our model outperforms the state-of-the-art toxicity prediction models in 8 out of 12 accuracy measures. Our experiments show that ensembling heterogeneous predictor improves the performance over single predictors and homogeneous ensembling of single predictors.The results show that each data representation or deep learning based predictor has its own strengths and weaknesses, thus employing a model ensembling multiple heterogeneous predictors could go beyond individual performance of each data representation or each predictor type.

Get full-text (via PubEx)

Predicting lethal courses in critically ill COVID-19 patients using a machine learning model trained on patients with non-COVID-19 viral pneumonia

Scientific Reports ◽

10.1038/s41598-021-92475-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gregor Lichtner ◽

Felix Balzer ◽

Stefan Haufe ◽

Niklas Giesa ◽

Fridtjof Schiefenhövel ◽

...

Keyword(s):

Machine Learning ◽

Critically Ill ◽

Prediction Models ◽

Predictive Performance ◽

Learning Model ◽

Mortality Prediction ◽

Viral Pneumonia ◽

Machine Learning Model ◽

Mortality Prediction Models ◽

Time Courses

AbstractIn a pandemic with a novel disease, disease-specific prognosis models are available only with a delay. To bridge the critical early phase, models built for similar diseases might be applied. To test the accuracy of such a knowledge transfer, we investigated how precise lethal courses in critically ill COVID-19 patients can be predicted by a model trained on critically ill non-COVID-19 viral pneumonia patients. We trained gradient boosted decision tree models on 718 (245 deceased) non-COVID-19 viral pneumonia patients to predict individual ICU mortality and applied it to 1054 (369 deceased) COVID-19 patients. Our model showed a significantly better predictive performance (AUROC 0.86 [95% CI 0.86–0.87]) than the clinical scores APACHE2 (0.63 [95% CI 0.61–0.65]), SAPS2 (0.72 [95% CI 0.71–0.74]) and SOFA (0.76 [95% CI 0.75–0.77]), the COVID-19-specific mortality prediction models of Zhou (0.76 [95% CI 0.73–0.78]) and Wang (laboratory: 0.62 [95% CI 0.59–0.65]; clinical: 0.56 [95% CI 0.55–0.58]) and the 4C COVID-19 Mortality score (0.71 [95% CI 0.70–0.72]). We conclude that lethal courses in critically ill COVID-19 patients can be predicted by a machine learning model trained on non-COVID-19 patients. Our results suggest that in a pandemic with a novel disease, prognosis models built for similar diseases can be applied, even when the diseases differ in time courses and in rates of critical and lethal courses.

Get full-text (via PubEx)

An Easy-to-Use Machine Learning Model to Predict the Prognosis of Patients With COVID-19: Retrospective Cohort Study (Preprint)

10.2196/preprints.24225 ◽

2020 ◽

Author(s):

Hyung-Jun Kim ◽

Deokjae Han ◽

Jeong-Han Kim ◽

Daehyun Kim ◽

Beomman Ha ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Prediction Models ◽

Discrimination Performance ◽

Learning Model ◽

Smoking History ◽

Case Report Form ◽

Validation Group ◽

Radiographic Findings ◽

Machine Learning Model

BACKGROUND Prioritizing patients in need of intensive care is necessary to reduce the mortality rate during the COVID-19 pandemic. Although several scoring methods have been introduced, many require laboratory or radiographic findings that are not always easily available. OBJECTIVE The purpose of this study was to develop a machine learning model that predicts the need for intensive care for patients with COVID-19 using easily obtainable characteristics—baseline demographics, comorbidities, and symptoms. METHODS A retrospective study was performed using a nationwide cohort in South Korea. Patients admitted to 100 hospitals from January 25, 2020, to June 3, 2020, were included. Patient information was collected retrospectively by the attending physicians in each hospital and uploaded to an online case report form. Variables that could be easily provided were extracted. The variables were age, sex, smoking history, body temperature, comorbidities, activities of daily living, and symptoms. The primary outcome was the need for intensive care, defined as admission to the intensive care unit, use of extracorporeal life support, mechanical ventilation, vasopressors, or death within 30 days of hospitalization. Patients admitted until March 20, 2020, were included in the derivation group to develop prediction models using an automated machine learning technique. The models were externally validated in patients admitted after March 21, 2020. The machine learning model with the best discrimination performance was selected and compared against the CURB-65 (confusion, urea, respiratory rate, blood pressure, and 65 years of age or older) score using the area under the receiver operating characteristic curve (AUC). RESULTS A total of 4787 patients were included in the analysis, of which 3294 were assigned to the derivation group and 1493 to the validation group. Among the 4787 patients, 460 (9.6%) patients needed intensive care. Of the 55 machine learning models developed, the XGBoost model revealed the highest discrimination performance. The AUC of the XGBoost model was 0.897 (95% CI 0.877-0.917) for the derivation group and 0.885 (95% CI 0.855-0.915) for the validation group. Both the AUCs were superior to those of CURB-65, which were 0.836 (95% CI 0.825-0.847) and 0.843 (95% CI 0.829-0.857), respectively. CONCLUSIONS We developed a machine learning model comprising simple patient-provided characteristics, which can efficiently predict the need for intensive care among patients with COVID-19.

Get full-text (via PubEx)

Adaptive Hybrid Machine Learning Model For Forecasting The Step-Like Displacements of Reservoir Colluvial Landslides: A Case Study in The Three Gorges Reservoir Area, China

10.21203/rs.3.rs-217782/v1 ◽

2021 ◽

Author(s):

Li Linwei ◽

Yiping Wu ◽

Miao Fasheng ◽

Xue Yang ◽

Huang Yepiao

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Warning System ◽

Learning Model ◽

Model Complexity ◽

Three Gorges Reservoir Area ◽

Gray Wolf ◽

Displacement Prediction ◽

Machine Learning Model ◽

Hybrid Machine

Abstract Constructing an accurate and stable displacement prediction model is essential to build a capable early warning system for landslide disasters. To overcome the drawbacks of previous displacement prediction models for step-like landslides, such as the incomplete or excessive decompositions of cumulative displacements and input factors and the redundancy or lack of input factors, we propose an adaptive hybrid machine learning model. This model is composed of three parts. First, candidate factors are proposed based on the macroscopic deformation response of landslides. Then, the landslide displacement and its candidate factors are adaptively decomposed into different displacement and factor components by applying optimized variational mode decomposition (OVMD). Second, in the gray wolf optimizer-based kernel extreme learning machine (GWO-KELM) model, the global sensitivity analysis (GSA) of the prediction results of different displacement components to each decomposed factor is analyzed based on the PAWN method. Then, the decomposed factors are reduced according to the GSA results. Third, based on the reduced factors, the optimal GWO-KELM models of the different displacement components are established to predict the displacement. Taking the Baishuihe landslide as an example, we used the raw data of three representative monitoring sites from June 2006 to December 2016 to verify the validity, accuracy, and stability of the model. The results indicate that the proposed hybrid model can effectively determine the displacement decomposition parameters. In addition, this model performed well over a three-year forecast with low model complexity.

Get full-text (via PubEx)

Mechanical Ventilator Parameter Estimation for Lung Health through Machine Learning

Bioengineering ◽

10.3390/bioengineering8050060 ◽

2021 ◽

Vol 8 (5) ◽

pp. 60

Author(s):

Sanjay Sarma Oruganti Venkata ◽

Amie Koenig ◽

Ramana M. Pidaparti

Keyword(s):

Machine Learning ◽

Mechanical Ventilation ◽

Prediction Models ◽

Particle Swarm ◽

Learning Model ◽

Good Prediction ◽

Machine Learning Model ◽

New Variant ◽

Target Values ◽

Lung Health

Patients whose lungs are compromised due to various respiratory health concerns require mechanical ventilation for support in breathing. Different mechanical ventilation settings are selected depending on the patient’s lung condition, and the selection of these parameters depends on the observed patient response and experience of the clinicians involved. To support this decision-making process for clinicians, good prediction models are always beneficial in improving the setting accuracy, reducing treatment error, and quickly weaning patients off the ventilation support. In this study, we developed a machine learning model for estimation of the mechanical ventilation parameters for lung health. The model is based on inverse mapping of artificial neural networks with the Graded Particle Swarm Optimizer. In this new variant, we introduced grouping and hierarchy in the swarm in addition to the general rules of particle swarm optimization to further improve its prediction performance of the mechanical ventilation parameters. The machine learning model was trained and tested using clinical data from canine and feline patients at the University of Georgia College of Veterinary Medicine. Our model successfully generated a range of parameter values for the mechanical ventilation applied on test data, with the average prediction values over multiple trials close to the target values. Overall, the developed machine learning model should be able to predict the mechanical ventilation settings for various respiratory conditions for patient’s survival once the relevant data are available.

Get full-text (via PubEx)

MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning

Journal of Cheminformatics ◽

10.1186/s13321-021-00533-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Hyuntae Lim ◽

YounJoon Jung

Keyword(s):

Machine Learning ◽

Free Energy ◽

Solvation Energy ◽

Solvation Free Energy ◽

Learning Technologies ◽

Training Data ◽

Inner Product ◽

Structure Property ◽

Energy Prediction ◽

Feature Vectors

AbstractRecent advances in machine learning technologies and their applications have led to the development of diverse structure–property relationship models for crucial chemical properties. The solvation free energy is one of them. Here, we introduce a novel ML-based solvation model, which calculates the solvation energy from pairwise atomistic interactions. The novelty of the proposed model consists of a simple architecture: two encoding functions extract atomic feature vectors from the given chemical structure, while the inner product between the two atomistic feature vectors calculates their interactions. The results of 6239 experimental measurements achieve outstanding performance and transferability for enlarging training data owing to its solvent-non-specific nature. An analysis of the interaction map shows that our model has significant potential for producing group contributions on the solvation energy, which indicates that the model provides not only predictions of target properties but also more detailed physicochemical insights.

Get full-text (via PubEx)

An Easy-to-Use Machine Learning Model to Predict the Prognosis of Patients With COVID-19: Retrospective Cohort Study

Journal of Medical Internet Research ◽

10.2196/24225 ◽

2020 ◽

Vol 22 (11) ◽

pp. e24225

Author(s):

Hyung-Jun Kim ◽

Deokjae Han ◽

Jeong-Han Kim ◽

Daehyun Kim ◽

Beomman Ha ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Prediction Models ◽

Discrimination Performance ◽

Learning Model ◽

Smoking History ◽

Case Report Form ◽

Validation Group ◽

Radiographic Findings ◽

Machine Learning Model

Background Prioritizing patients in need of intensive care is necessary to reduce the mortality rate during the COVID-19 pandemic. Although several scoring methods have been introduced, many require laboratory or radiographic findings that are not always easily available. Objective The purpose of this study was to develop a machine learning model that predicts the need for intensive care for patients with COVID-19 using easily obtainable characteristics—baseline demographics, comorbidities, and symptoms. Methods A retrospective study was performed using a nationwide cohort in South Korea. Patients admitted to 100 hospitals from January 25, 2020, to June 3, 2020, were included. Patient information was collected retrospectively by the attending physicians in each hospital and uploaded to an online case report form. Variables that could be easily provided were extracted. The variables were age, sex, smoking history, body temperature, comorbidities, activities of daily living, and symptoms. The primary outcome was the need for intensive care, defined as admission to the intensive care unit, use of extracorporeal life support, mechanical ventilation, vasopressors, or death within 30 days of hospitalization. Patients admitted until March 20, 2020, were included in the derivation group to develop prediction models using an automated machine learning technique. The models were externally validated in patients admitted after March 21, 2020. The machine learning model with the best discrimination performance was selected and compared against the CURB-65 (confusion, urea, respiratory rate, blood pressure, and 65 years of age or older) score using the area under the receiver operating characteristic curve (AUC). Results A total of 4787 patients were included in the analysis, of which 3294 were assigned to the derivation group and 1493 to the validation group. Among the 4787 patients, 460 (9.6%) patients needed intensive care. Of the 55 machine learning models developed, the XGBoost model revealed the highest discrimination performance. The AUC of the XGBoost model was 0.897 (95% CI 0.877-0.917) for the derivation group and 0.885 (95% CI 0.855-0.915) for the validation group. Both the AUCs were superior to those of CURB-65, which were 0.836 (95% CI 0.825-0.847) and 0.843 (95% CI 0.829-0.857), respectively. Conclusions We developed a machine learning model comprising simple patient-provided characteristics, which can efficiently predict the need for intensive care among patients with COVID-19.

Get full-text (via PubEx)