Predicting Tree Sap Flux and Stomatal Conductance from Drone-Recorded Surface Temperatures in a Mixed Agroforestry System—A Machine Learning Approach

Plant transpiration is a key element in the hydrological cycle. Widely used methods for its assessment comprise sap flux techniques for whole-plant transpiration and porometry for leaf stomatal conductance. Recently emerging approaches based on surface temperatures and a wide range of machine learning techniques offer new possibilities to quantify transpiration. The focus of this study was to predict sap flux and leaf stomatal conductance based on drone-recorded and meteorological data and compare these predictions with in-situ measured transpiration. To build the prediction models, we applied classical statistical approaches and machine learning algorithms. The field work was conducted in an oil palm agroforest in lowland Sumatra. Random forest predictions yielded the highest congruence with measured sap flux (r2 = 0.87 for trees and r2 = 0.58 for palms) and confidence intervals for intercept and slope of a Passing-Bablok regression suggest interchangeability of the methods. Differences in model performance are indicated when predicting different tree species. Predictions for stomatal conductance were less congruent for all prediction methods, likely due to spatial and temporal offsets of the measurements. Overall, the applied drone and modelling scheme predicts whole-plant transpiration with high accuracy. We conclude that there is large potential in machine learning approaches for ecological applications such as predicting transpiration.

Download Full-text

Machine Learning Methodologies for Prediction of Rhythm-Control Strategy in Patients Diagnosed with Atrial Fibrillation: Model Development and Comparison Study (Preprint)

10.2196/preprints.29225 ◽

2021 ◽

Author(s):

Rachel Soon-Yong Kim ◽

Steve Simon ◽

Brett Powers ◽

Amneet Sandhu ◽

Jose Sanchez ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Control Strategy ◽

Prediction Models ◽

Rhythm Control ◽

Machine Learning Algorithms ◽

Support Tool ◽

Wide Range ◽

Complex Models ◽

The Impact

BACKGROUND Identification of the appropriate rhythm management strategy for patients diagnosed with atrial fibrillation (AF) remains a major challenge for providers. While clinical trials have identified sub-groups of patients in whom a rate- or rhythm-control strategy might be indicated to improve outcomes, the wide range of presentations and risk factors among patients presenting with AF makes such approaches challenging. A strength of electronic health records (EHR) is the ability to build in logic to guide management decisions, such that the system can automatically identify patients in whom a rhythm-control strategy is more likely and promote efficient referrals to specialists. However, like any clinical decision-support tool, there is a balance between interpretability and accurate prediction. OBJECTIVE In this investigation, we sought to create an EHR-based prediction tool to guide patient referral to specialists for rhythm-control management by comparing different machine learning algorithms. METHODS We compared machine learning models of increasing complexity and using up to 50,845 variables to predict the rhythm-control strategy in 42,022 patients within the UC Health system at the time of AF diagnosis. Models were evaluated on their classification accuracy, defined by the F1 score and other metrics, and interpretability, captured by inspection of the relative importance of each predictor. RESULTS We found that age was by far the strongest single predictor of a rhythm-control strategy, but that greater accuracy could be achieved with more complex models incorporating neural networks and more predictors for each subject. We determined that the impact of better prediction models was notable primarily in the rate of inappropriate referrals for rhythm-control, in which more complex models provided an average of 20% fewer inappropriate referrals than simpler, more interpretable models. CONCLUSIONS We conclude that any healthcare system seeking to incorporate algorithms to guide rhythm management for patients with AF will need to address this trade-off between prediction accuracy and model interpretability.

Download Full-text

COVID-19 Outbreak Prediction with Machine Learning

10.34055/osf.io/xr4js ◽

2020 ◽

Author(s):

Sina Faizollahzadeh Ardabili ◽

Amir Mosavi ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

Annamaria R. Varkonyi-Koczy ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Fuzzy Inference ◽

Control Measures ◽

Future Research ◽

Complex Nature ◽

Inference System ◽

Wide Range ◽

Standard Models ◽

High Level

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed-decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and statistical models have received more attention by authorities, and they are popular in the media. Due to a high level of uncertainty and lack of essential data, standard models have shown low accuracy for long-term prediction. Although the literature includes several attempts to address this issue, the essential generalization and robustness abilities of existing models needs to be improved. This paper presents a comparative analysis of machine learning and soft computing models to predict the COVID-19 outbreak as an alternative to SIR and SEIR models. Among a wide range of machine learning models investigated, two models showed promising results (i.e., multi-layered perceptron, MLP, and adaptive network-based fuzzy inference system, ANFIS). Based on the results reported here, and due to the highly complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. Paper further suggests that real novelty in outbreak prediction can be realized through integrating machine learning and SEIR models.

Download Full-text

Intelligent Techniques Analysis for Glycosylation Site Prediction

Current Bioinformatics ◽

10.2174/1574893615666210108094847 ◽

2021 ◽

Vol 15 ◽

Author(s):

Alhassan Alkuhlani ◽

Walaa Gad ◽

Mohamed Roushdy ◽

Abdel-Badeeh M. Salem

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Cell Interaction ◽

Glycosylation Site ◽

Machine Learning Classification ◽

Site Prediction ◽

Glycosylation Sites ◽

Wide Range ◽

Feature Extraction And Selection ◽

Computational Intelligent

Background: Glycosylation is one of the most common post-translation modifications (PTMs) in organism cells. It plays important roles in several biological processes including cell-cell interaction, protein folding, antigen’s recognition, and immune response. In addition, glycosylation is associated with many human diseases such as cancer, diabetes and coronaviruses. The experimental techniques for identifying glycosylation sites are time-consuming, extensive laboratory work, and expensive. Therefore, computational intelligence techniques are becoming very important for glycosylation site prediction. Objective: This paper is a theoretical discussion of the technical aspects of the biotechnological (e.g., using artificial intelligence and machine learning) to digital bioinformatics research and intelligent biocomputing. The computational intelligent techniques have shown efficient results for predicting N-linked, O-linked and C-linked glycosylation sites. In the last two decades, many studies have been conducted for glycosylation site prediction using these techniques. In this paper, we analyze and compare a wide range of intelligent techniques of these studies from multiple aspects. The current challenges and difficulties facing the software developers and knowledge engineers for predicting glycosylation sites are also included. Method: The comparison between these different studies is introduced including many criteria such as databases, feature extraction and selection, machine learning classification methods, evaluation measures and the performance results. Results and conclusions: Many challenges and problems are presented. Consequently, more efforts are needed to get more accurate prediction models for the three basic types of glycosylation sites.

Download Full-text

Development of Prediction Models Using Machine Learning Algorithms for Girls with Suspected Central Precocious Puberty: Retrospective Study (Preprint)

10.2196/preprints.11728 ◽

2018 ◽

Author(s):

Liyan Pan ◽

Guangjian Liu ◽

Xiaojian Mao ◽

Huixian Li ◽

Jiexin Zhang ◽

...

Keyword(s):

Machine Learning ◽

Retrospective Study ◽

Random Forest ◽

Precocious Puberty ◽

Prediction Models ◽

Central Precocious Puberty ◽

Machine Learning Algorithms ◽

Stimulation Test ◽

Gnrh Analogue ◽

Prediction Probability

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.

Download Full-text

Development of Machine Learning Models for Prediction of Smoking Cessation Outcome

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052584 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2584

Author(s):

Cheng-Chien Lai ◽

Wei-Hsin Huang ◽

Betty Chia-Chen Chang ◽

Lee-Ching Hwang

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Success Rate ◽

Prediction Models ◽

Smoking Status ◽

Medical Center ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Support Vector ◽

Smoking Cessation Outcome

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Chronic Kidney Disease Prediction using Machine Learning Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2213.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6364-6367

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Kidney Disease ◽

Kidney Failure ◽

Bone Diseases ◽

Machine Learning Algorithms ◽

Knowledge Generation ◽

Good Prediction ◽

Worst Case ◽

Wide Range

The field of biosciences have advanced to a larger extent and have generated large amounts of information from Electronic Health Records. This have given rise to the acute need of knowledge generation from this enormous amount of data. Data mining methods and machine learning play a major role in this aspect of biosciences. Chronic Kidney Disease(CKD) is a condition in which the kidneys are damaged and cannot filter blood as they always do. A family history of kidney diseases or failure, high blood pressure, type 2 diabetes may lead to CKD. This is a lasting damage to the kidney and chances of getting worser by time is high. The very common complications that results due to a kidney failure are heart diseases, anemia, bone diseases, high potasium and calcium. The worst case situation leads to complete kidney failure and necessitates kidney transplant to live. An early detection of CKD can improve the quality of life to a greater extent. This calls for good prediction algorithm to predict CKD at an earlier stage . Literature shows a wide range of machine learning algorithms employed for the prediction of CKD. This paper uses data preprocessing,data transformation and various classifiers to predict CKD and also proposes best Prediction framework for CKD. The results of the framework show promising results of better prediction at an early stage of CKD

Download Full-text

Determination of Significant Features for Building an Efficient Heart Disease Prediction System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3393.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 4499-4504

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Prediction Model ◽

Prediction Models ◽

Heart Diseases ◽

Medical Diagnostics ◽

Medical Data ◽

Machine Learning Algorithms ◽

Prediction System ◽

Early Stages

Heart diseases are responsible for the greatest number of deaths all over the world. These diseases are usually not detected in early stages as the cost of medical diagnostics is not affordable by a majority of the people. Research has shown that machine learning methods have a great capability to extract valuable information from the medical data. This information is used to build the prediction models which provide cost effective technological aid for a medical practitioner to detect the heart disease in early stages. However, the presence of some irrelevant and redundant features in medical data deteriorates the competence of the prediction system. This research was aimed to improve the accuracy of the existing methods by removing such features. In this study, brute force-based algorithm of feature selection was used to determine relevant significant features. After experimenting rigorously with 7528 possible combinations of features and 5 machine learning algorithms, 8 important features were identified. A prediction model was developed using these significant features. Accuracy of this model is experimentally calculated to be 86.4%which is higher than the results of existing studies. The prediction model proposed in this study shall help in predicting heart disease efficiently.

Download Full-text

An integrated review on machine learning approaches for heart disease prediction: Direction towards future research gaps

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2020-0069 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Fathima Aliyar Vellameeran ◽

Thomas Brindha

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Future Research ◽

Disease Prediction ◽

Learning Approaches ◽

Research Papers ◽

Using Data ◽

Intelligent Methods

Abstract Objectives To make a clear literature review on state-of-the-art heart disease prediction models. Methods It reviews 61 research papers and states the significant analysis. Initially, the analysis addresses the contributions of each literature works and observes the simulation environment. Here, different types of machine learning algorithms deployed in each contribution. In addition, the utilized dataset for existing heart disease prediction models was observed. Results The performance measures computed in entire papers like prediction accuracy, prediction error, specificity, sensitivity, f-measure, etc., are learned. Further, the best performance is also checked to confirm the effectiveness of entire contributions. Conclusions The comprehensive research challenges and the gap are portrayed based on the development of intelligent methods concerning the unresolved challenges in heart disease prediction using data mining techniques.

Download Full-text