A predictive performance comparison of machine learning models for judicial cases

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

Temporal and Spatial Autocorrelation as Determinants of Regional AOD-PM2.5 Model Performance in the Middle East

Remote Sensing ◽

10.3390/rs13183790 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3790

Author(s):

Khang Chau ◽

Meredith Franklin ◽

Huikyo Lee ◽

Michael Garay ◽

Olga Kalashnikova

Keyword(s):

Machine Learning ◽

Middle East ◽

United Arab Emirates ◽

Atmospheric Correction ◽

Predictive Performance ◽

Variable Importance ◽

Learning Models ◽

Median Test ◽

Temporal And Spatial ◽

Machine Learning Models

Exposure to fine particulate matter (PM2.5) air pollution has been shown in numerous studies to be associated with detrimental health effects. However, the ability to conduct epidemiological assessments can be limited due to challenges in generating reliable PM2.5 estimates, particularly in parts of the world such as the Middle East where measurements are scarce and extreme meteorological events such as sandstorms are frequent. In order to supplement exposure modeling efforts under such conditions, satellite-retrieved aerosol optical depth (AOD) has proven to be useful due to its global coverage. By using AODs from the Multiangle Implementation of Atmospheric Correction (MAIAC) of the MODerate Resolution Imaging Spectroradiometer (MODIS) and the Multiangle Imaging Spectroradiometer (MISR) combined with meteorological and assimilated aerosol information from the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), we constructed machine learning models to predict PM2.5 in the area surrounding the Persian Gulf, including Kuwait, Bahrain, and the United Arab Emirates (U.A.E). Our models showed regional differences in predictive performance, with better results in the U.A.E. (median test R2 = 0.66) than Kuwait (median test R2 = 0.51). Variable importance also differed by region, where satellite-retrieved AOD variables were more important for predicting PM2.5 in Kuwait than in the U.A.E. Divergent trends in the temporal and spatial autocorrelations of PM2.5 and AOD in the two regions offered possible explanations for differences in predictive performance and variable importance. In a test of model transferability, we found that models trained in one region and applied to another did not predict PM2.5 well, even if the transferred model had better performance. Overall the results of our study suggest that models developed over large geographic areas could generate PM2.5 estimates with greater uncertainty than could be obtained by taking a regional modeling approach. Furthermore, development of methods to better incorporate spatial and temporal autocorrelations in machine learning models warrants further examination.

Download Full-text

Performance Comparison of Machine Learning Models for Diabetes Prediction

2021 29th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu53274.2021.9477824 ◽

2021 ◽

Author(s):

Pinar Cihan ◽

Hakan Coskun

Keyword(s):

Machine Learning ◽

Performance Comparison ◽

Learning Models ◽

Diabetes Prediction ◽

Machine Learning Models

Download Full-text

Performance Comparison of Machine Learning Models Trained on Manual vs ASR Transcriptions for Dialogue Act Annotation

2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai.2018.00156 ◽

2018 ◽

Cited By ~ 1

Author(s):

Usman Malik ◽

Mukesh Barange ◽

Julien Saunier ◽

Alexandre Pauchet

Keyword(s):

Machine Learning ◽

Performance Comparison ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Development of Combined Heavy Rain Damage Prediction Models with Machine Learning

Water ◽

10.3390/w11122516 ◽

2019 ◽

Vol 11 (12) ◽

pp. 2516 ◽

Cited By ~ 1

Author(s):

Changhyun Choi ◽

Jeonghwan Kim ◽

Jungwook Kim ◽

Hung Soo Kim

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Prediction Model ◽

Prediction Models ◽

Predictive Performance ◽

Heavy Rain ◽

Learning Models ◽

Damage Prediction ◽

Natural Disaster Management ◽

Machine Learning Models

Adequate forecasting and preparation for heavy rain can minimize life and property damage. Some studies have been conducted on the heavy rain damage prediction model (HDPM), however, most of their models are limited to the linear regression model that simply explains the linear relation between rainfall data and damage. This study develops the combined heavy rain damage prediction model (CHDPM) where the residual prediction model (RPM) is added to the HDPM. The predictive performance of the CHDPM is analyzed to be 4–14% higher than that of HDPM. Through this, we confirmed that the predictive performance of the model is improved by combining the RPM of the machine learning models to complement the linearity of the HDPM. The results of this study can be used as basic data beneficial for natural disaster management.

Download Full-text

Significance Of Multilayer Perceptron Model For Early Detection Of Diabetes Over Ml Methods

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/08358 ◽

2021 ◽

Vol 23 (08) ◽

pp. 148-160

Author(s):

Dr. V.Vasudha Rani ◽

◽

Dr. G. Vasavi ◽

Dr. K.R.N Kiran Kumar ◽

◽

...

Keyword(s):

Machine Learning ◽

Multilayer Perceptron ◽

Predictive Analytics ◽

Early Stage ◽

Feature Selection Method ◽

Health Condition ◽

Performance Comparison ◽

Learning Approaches ◽

Learning Models ◽

Machine Learning Models

Diabetes is one of the chronicdiseases in the world. Millions of people are suffering with several other health issues caused by diabetes, every year. Diabetes has got three stages such as type2, type1 and insulin. Curing of diabetes disease at later stages is practically difficult. Here in this paper, we proposed a DNN model and its performance comparison with some of the machine learning models to predict the disease at an earlystage based on the current health condition of the patient. An artificial neural network (ANN) is a predictive model designed to work the same way a human brain does and works better with larger datasets. Having the concept of hidden layers, neural networks work better at predictive analytics and can make predictions with more accuracy. Novelty of this work lies in integration of feature selection method used to optimize the Multilayer Perceptron (MLP) to reduce the number of required input attributes. The results achieved using this method and several conventional machines learning approaches such as Logistic Regression, Random Forest Classifier (RFC) are compared. The proposed DNN method is proved to show better accuracy than Machine learning models for early stage detection of diabetes. This paper work is applicable to clinical support as a tool for making predecisions by the doctors and physicians.

Download Full-text

Performance Comparison of Machine Learning Models for Classification of Traffic Injury Severity from Imbalanced Accident Dataset

Intelligence in Big Data Technologies—Beyond the Hype - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-5285-4_36 ◽

2020 ◽

pp. 361-369

Author(s):

P. Joyce Beryl Princess ◽

Salaja Silas ◽

Elijah Blessing Rajsingh

Keyword(s):

Machine Learning ◽

Injury Severity ◽

Performance Comparison ◽

Learning Models ◽

Traffic Injury ◽

Machine Learning Models

Download Full-text

Machine Learning to Predict 10-year Cardiovascular Mortality from the Electrocardiogram: Analysis of the Third National Health and Nutrition Examination Survey (NHANES III)

10.1101/2021.09.09.21263327 ◽

2021 ◽

Author(s):

Chang H Kim ◽

Sadeer Al-Kindi ◽

Yasir Tarabichi ◽

Suril Gohel ◽

Riddhi Vyas ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Mortality ◽

Predictive Performance ◽

Nhanes Iii ◽

Nutrition Examination Survey ◽

Learning Models ◽

The Third ◽

Health And Nutrition ◽

Machine Learning Models ◽

Ecg Data

Background: The value of the electrocardiogram (ECG) for predicting long-term cardiovascular outcomes is not well defined. Machine learning methods are well suited for analysis of highly correlated data such as that from the ECG. Methods: Using demographic, clinical, and 12-lead ECG data from the Third National Health and Nutrition Examination Survey (NHANES III), machine learning models were trained to predict 10-year cardiovascular mortality in ambulatory U.S. adults. Predictive performance of each model was assessed using area under receiver operating characteristic curve (AUROC), area under precision-recall curve (AUPRC), sensitivity, and specificity. These were compared to the 2013 American College of Cardiology/American Heart Association Pooled Cohort Equations (PCE). Results: 7,067 study participants (mean age: 59.2 +/- 13.4 years, female: 52.5%, white: 73.9%, black: 23.3%) were included. At 10 years of follow up, 338 (4.8%) had died from cardiac causes. Compared to the PCE (AUROC: 0.668, AUPRC: 0.125, sensitivity: 0.492, specificity: 0.859), machine learning models only required demographic and ECG data to achieve comparable performance: logistic regression (AUROC: 0.754, AUPRC: 0.141, sensitivity: 0.747, specificity: 0.759), neural network (AUROC: 0.764, AUPRC: 0.149, sensitivity: 0.722, specificity: 0.787), and ensemble model (AUROC: 0.695, AUPRC: 0.166, sensitivity: 0.468, specificity: 0.912). Additional clinical data did not improve the predictive performance of machine learning models. In variable importance analysis, important ECG features clustered in inferior and lateral leads. Conclusions: Machine learning can be applied to demographic and ECG data to predict 10-year cardiovascular mortality in ambulatory adults, with potentially important implications for primary prevention.

Download Full-text

Robust Performance of Potentially Functional SNPs in Machine Learning Models for the Prediction of Atorvastatin-Induced Myalgia

Frontiers in Pharmacology ◽

10.3389/fphar.2021.605764 ◽

2021 ◽

Vol 12 ◽

Author(s):

Brandon N. S. Ooi ◽

Raechell ◽

Ariel F. Ying ◽

Yong Zher Koh ◽

Yu Jin ◽

...

Keyword(s):

Machine Learning ◽

Predictive Performance ◽

Whole Genome ◽

Learning Models ◽

Association Analyses ◽

Functional Snps ◽

Individual Snps ◽

Genome Association ◽

Whole Genome Association ◽

Machine Learning Models

Background:Statins can cause muscle symptoms resulting in poor adherence to therapy and increased cardiovascular risk. We hypothesize that combinations of potentially functional SNPs (pfSNPs), rather than individual SNPs, better predict myalgia in patients on atorvastatin. This study assesses the value of potentially functional single nucleotide polymorphisms (pfSNPs) and employs six machine learning algorithms to identify the combination of SNPs that best predict myalgia.Methods: Whole genome sequencing of 183 Chinese, Malay and Indian patients from Singapore was conducted to identify genetic variants associated with atorvastatin induced myalgia. To adjust for confounding factors, demographic and clinical characteristics were also examined for their association with myalgia. The top factor, sex, was then used as a covariate in the whole genome association analyses. Variants that were highly associated with myalgia from this and previous studies were extracted, assessed for potential functionality (pfSNPs) and incorporated into six machine learning models. Predictive performance of a combination of different models and inputs were compared using the average cross validation area under ROC curve (AUC). The minimum combination of SNPs to achieve maximum sensitivity and specificity as determined by AUC, that predict atorvastatin-induced myalgia in most, if not all the six machine learning models was determined.Results: Through whole genome association analyses using sex as a covariate, a larger proportion of pfSNPs compared to non-pf SNPs were found to be highly associated with myalgia. Although none of the individual SNPs achieved genome wide significance in univariate analyses, machine learning models identified a combination of 15 SNPs that predict myalgia with good predictive performance (AUC >0.9). SNPs within genes identified in this study significantly outperformed SNPs within genes previously reported to be associated with myalgia. pfSNPs were found to be more robust in predicting myalgia, outperforming non-pf SNPs in the majority of machine learning models tested.Conclusion: Combinations of pfSNPs that were consistently identified by different machine learning models to have high predictive performance have good potential to be clinically useful for predicting atorvastatin-induced myalgia once validated against an independent cohort of patients.

Download Full-text

Empirical asset pricing via machine learning: evidence from the European stock market

Journal of Asset Management ◽

10.1057/s41260-021-00237-x ◽

2021 ◽

Author(s):

Wolfgang Drobetz ◽

Tizian Otto

Keyword(s):

Machine Learning ◽

Stock Returns ◽

Network Architecture ◽

Risk Measures ◽

Predictive Performance ◽

Support Vector ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Models

AbstractThis paper evaluates the predictive performance of machine learning methods in forecasting European stock returns. Compared to a linear benchmark model, interactions and nonlinear effects help improve the predictive performance. But machine learning models must be adequately trained and tuned to overcome the high dimensionality problem and to avoid overfitting. Across all machine learning methods, the most important predictors are based on price trends and fundamental signals from valuation ratios. However, the models exhibit substantial variation in statistical predictive performance that translate into pronounced differences in economic profitability. The return and risk measures of long-only trading strategies indicate that machine learning models produce sizeable gains relative to our benchmark. Neural networks perform best, also after accounting for transaction costs. A classification-based portfolio formation, utilizing a support vector machine that avoids estimating stock-level expected returns, performs even better than the neural network architecture.

Download Full-text