scholarly journals A community-powered search of machine learning strategy space to find NMR property prediction models

PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0253612
Author(s):  
Lars A. Bratholm ◽  
Will Gerrard ◽  
Brandon Anderson ◽  
Shaojie Bai ◽  
Sunghwan Choi ◽  
...  

The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published ‘in-house’ efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.

2020 ◽  
Vol 14 (2) ◽  
pp. 140-159
Author(s):  
Anthony-Paul Cooper ◽  
Emmanuel Awuni Kolog ◽  
Erkki Sutinen

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Fathima Aliyar Vellameeran ◽  
Thomas Brindha

Abstract Objectives To make a clear literature review on state-of-the-art heart disease prediction models. Methods It reviews 61 research papers and states the significant analysis. Initially, the analysis addresses the contributions of each literature works and observes the simulation environment. Here, different types of machine learning algorithms deployed in each contribution. In addition, the utilized dataset for existing heart disease prediction models was observed. Results The performance measures computed in entire papers like prediction accuracy, prediction error, specificity, sensitivity, f-measure, etc., are learned. Further, the best performance is also checked to confirm the effectiveness of entire contributions. Conclusions The comprehensive research challenges and the gap are portrayed based on the development of intelligent methods concerning the unresolved challenges in heart disease prediction using data mining techniques.


2020 ◽  
Vol 9 (12) ◽  
pp. 732
Author(s):  
Hongjie Yu ◽  
Lin Liu ◽  
Bo Yang ◽  
Minxuan Lan

Crime prediction using machine learning and data fusion assimilation has become a hot topic. Most of the models rely on historical crime data and related environment variables. The activity of potential offenders affects the crime patterns, but the data with fine resolution have not been applied in the crime prediction. The goal of this study is to test the effect of the activity of potential offenders in the crime prediction by combining this data in the prediction models and assessing the prediction accuracies. This study uses the movement data of past offenders collected in routine police stop-and-question operations to infer the movement of future offenders. The offender movement data compensates historical crime data in a Spatio-Temporal Cokriging (ST-Cokriging) model for crime prediction. The models are implemented for weekly, biweekly, and quad-weekly prediction in the XT police district of ZG city, China. Results with the incorporation of the offender movement data are consistently better than those without it. The improvement is most pronounced for the weekly model, followed by the biweekly model, and the quad-weekly model. In sum, the addition of offender movement data enhances crime prediction, especially for short periods.


Endocrine ◽  
2021 ◽  
Author(s):  
Olivier Zanier ◽  
Matteo Zoli ◽  
Victor E. Staartjes ◽  
Federica Guaraldi ◽  
Sofia Asioli ◽  
...  

Abstract Purpose Biochemical remission (BR), gross total resection (GTR), and intraoperative cerebrospinal fluid (CSF) leaks are important metrics in transsphenoidal surgery for acromegaly, and prediction of their likelihood using machine learning would be clinically advantageous. We aim to develop and externally validate clinical prediction models for outcomes after transsphenoidal surgery for acromegaly. Methods Using data from two registries, we develop and externally validate machine learning models for GTR, BR, and CSF leaks after endoscopic transsphenoidal surgery in acromegalic patients. For the model development a registry from Bologna, Italy was used. External validation was then performed using data from Zurich, Switzerland. Gender, age, prior surgery, as well as Hardy and Knosp classification were used as input features. Discrimination and calibration metrics were assessed. Results The derivation cohort consisted of 307 patients (43.3% male; mean [SD] age, 47.2 [12.7] years). GTR was achieved in 226 (73.6%) and BR in 245 (79.8%) patients. In the external validation cohort with 46 patients, 31 (75.6%) achieved GTR and 31 (77.5%) achieved BR. Area under the curve (AUC) at external validation was 0.75 (95% confidence interval: 0.59–0.88) for GTR, 0.63 (0.40–0.82) for BR, as well as 0.77 (0.62–0.91) for intraoperative CSF leaks. While prior surgery was the most important variable for prediction of GTR, age, and Hardy grading contributed most to the predictions of BR and CSF leaks, respectively. Conclusions Gross total resection, biochemical remission, and CSF leaks remain hard to predict, but machine learning offers potential in helping to tailor surgical therapy. We demonstrate the feasibility of developing and externally validating clinical prediction models for these outcomes after surgery for acromegaly and lay the groundwork for development of a multicenter model with more robust generalization.


Author(s):  
Akshata Kulkarni

Abstract: Officials around the world are using several COVID-19 outbreak prediction models to make educated decisions and enact necessary control measures. In this study, we developed a Machine Learning model which predicts and forecasts the COVID-19 outbreak in India, with the goal of determining the best regression model for an in-depth examination of the novel coronavirus. Based on data available from January 31 to October 31, 2020, collected from Kaggle, this model predicts the number of confirmed cases in Maharashtra. We're using a Machine Learning model to foresee the future trend of these situations. The project has the potential to demonstrate the importance of information dissemination in improving response time and planning ahead of time to help reduce risk.


2022 ◽  
Vol 8 ◽  
Author(s):  
Jinzhang Li ◽  
Ming Gong ◽  
Yashutosh Joshi ◽  
Lizhong Sun ◽  
Lianjun Huang ◽  
...  

BackgroundAcute renal failure (ARF) is the most common major complication following cardiac surgery for acute aortic syndrome (AAS) and worsens the postoperative prognosis. Our aim was to establish a machine learning prediction model for ARF occurrence in AAS patients.MethodsWe included AAS patient data from nine medical centers (n = 1,637) and analyzed the incidence of ARF and the risk factors for postoperative ARF. We used data from six medical centers to compare the performance of four machine learning models and performed internal validation to identify AAS patients who developed postoperative ARF. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was used to compare the performance of the predictive models. We compared the performance of the optimal machine learning prediction model with that of traditional prediction models. Data from three medical centers were used for external validation.ResultsThe eXtreme Gradient Boosting (XGBoost) algorithm performed best in the internal validation process (AUC = 0.82), which was better than both the logistic regression (LR) prediction model (AUC = 0.77, p < 0.001) and the traditional scoring systems. Upon external validation, the XGBoost prediction model (AUC =0.81) also performed better than both the LR prediction model (AUC = 0.75, p = 0.03) and the traditional scoring systems. We created an online application based on the XGBoost prediction model.ConclusionsWe have developed a machine learning model that has better predictive performance than traditional LR prediction models as well as other existing risk scoring systems for postoperative ARF. This model can be utilized to provide early warnings when high-risk patients are found, enabling clinicians to take prompt measures.


2019 ◽  
Author(s):  
Yanli Zhang-James ◽  
Qi Chen ◽  
Ralf Kuja-Halkola ◽  
Paul Lichtenstein ◽  
Henrik Larsson ◽  
...  

AbstractBackgroundChildren with attention-deficit/hyperactivity disorder (ADHD) have a high risk for substance use disorders (SUDs). Early identification of at-risk youth would help allocate scarce resources for prevention programs.MethodsPsychiatric and somatic diagnoses, family history of these disorders, measures of socioeconomic distress and information about birth complications were obtained from the national registers in Sweden for 19,787 children with ADHD born between 1989-1993. We trained 1) cross-sectional machine learning models using data available by age 17 to predict SUD diagnosis between ages 18-19; and 2) a longitudinal model to predict new diagnoses at each age.ResultsThe area under the receiver operating characteristic curve (AUC) was 0.73 and 0.71 for the random forest and multilayer perceptron cross-sectional models. A prior diagnosis of SUD was the most important predictor, accounting for 25% of correct predictions. However, after excluding this predictor, our model still significantly predicted the first-time diagnosis of SUD during age 18-19 with an AUC of 0.67. The average of the AUCs from longitudinal models predicting new diagnoses one, two, five and ten years in the future was 0.63.ConclusionsSignificant predictions of at-risk co-morbid SUDs in individuals with ADHD can be achieved using population registry data, even many years prior to the first diagnosis. Longitudinal models can potentially monitor their risks over time. More work is needed to create prediction models based on electronic health records or linked population-registers that are sufficiently accurate for use in the clinic.


2020 ◽  
Author(s):  
Joon Lee

UNSTRUCTURED In contrast with medical imaging diagnostics powered by artificial intelligence (AI), in which deep learning has led to breakthroughs in recent years, patient outcome prediction poses an inherently challenging problem because it focuses on events that have not yet occurred. Interestingly, the performance of machine learning–based patient outcome prediction models has rarely been compared with that of human clinicians in the literature. Human intuition and insight may be sources of underused predictive information that AI will not be able to identify in electronic data. Both human and AI predictions should be investigated together with the aim of achieving a human-AI symbiosis that synergistically and complementarily combines AI with the predictive abilities of clinicians.


Symmetry ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 728 ◽  
Author(s):  
Lijuan Yan ◽  
Yanshen Liu

Student performance prediction has become a hot research topic. Most of the existing prediction models are built by a machine learning method. They are interested in prediction accuracy but pay less attention to interpretability. We propose a stacking ensemble model to predict and analyze student performance in academic competition. In this model, student performance is classified into two symmetrical categorical classes. To improve accuracy, three machine learning algorithms, including support vector machine (SVM), random forest, and AdaBoost are established in the first level and then integrated by logistic regression via stacking. A feature importance analysis was applied to identify important variables. The experimental data were collected from four academic years in Hankou University. According to comparative studies on five evaluation metrics (precision, recall, F1, error, and area   under   the   receiver   operating   characteristic   curve ( AUC ) in this analysis, the proposed model generally performs better than compared models. The important variables identified from the analysis are interpretable, they can be used as guidance to select potential students.


Author(s):  
Jaime Lynn Speiser ◽  
Kathryn E Callahan ◽  
Denise K Houston ◽  
Jason Fanning ◽  
Thomas M Gill ◽  
...  

Abstract Background Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. Method We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Results Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated using data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Conclusions Machine learning methods offer an alternative to traditional approaches for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Sign in / Sign up

Export Citation Format

Share Document