Development of Rainfall Prediction Models Using Machine Learning Approaches for Different Agro-Climatic Zones

This study focuses on modelling the changes in rainfall patterns in different agro-climatic zones due to climate change through statistical downscaling of large-scale climate variables using machine learning approaches. Potential of three machine learning algorithms, multilayer artificial neural network (MLANN), radial basis function neural network (RBFNN), and least square support vector machine (LS-SVM) have been investigated. The large-scale climate variable are obtained from National Centre for Environmental Prediction (NCEP) reanalysis product and used as predictors for model development. Proposed machine learning models are applied to generate projected time series of rainfall for the period 2021-2050 using the Hadley Centre coupled model (HadCM3) B2 emission scenario data as predictors. An increasing trend in anticipated rainfall is observed during 2021-2050 in all the ACZs of Chhattisgarh State. Among the machine learning models, RBFNN found as more feasible technique for modeling of monthly rainfall in this region.

Download Full-text

Effectiveness, Explainability and Reliability of Machine Meta-Learning Methods for Predicting Mortality in Patients with COVID-19: Results of the Brazilian COVID-19 Registry

10.1101/2021.11.01.21265527 ◽

2021 ◽

Author(s):

Bruno Barbosa Miranda de Paiva ◽

Polianna Delfino Pereira ◽

Claudio Moises Valiense de Andrade ◽

Virginia Mara Reis Gomes ◽

Maria Clara Pontello Barbosa Lima ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

State Of The Art ◽

Laboratory Data ◽

Machine Learning Algorithms ◽

Training Data ◽

Learning Models ◽

Learning Methods ◽

Meta Learning ◽

Machine Learning Models

Objective: To provide a thorough comparative study among state ofthe art machine learning methods and statistical methods for determining in-hospital mortality in COVID 19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods. Materials and Methods: De-identified data were obtained from COVID 19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID 19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross validation procedure, from which we assessed performance and interpretability metrics. Results: The Stacking of machine learning models improved over the previous state of the art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macroF1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the why. Conclusion: The best results were obtained using the meta learning ensemble model Stacking. State of the art explainability techniques such as SHAP values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions. Key words: COVID-19; prognosis; prediction model; machine learning

Download Full-text

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

10.26434/chemrxiv.11811564.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Michael Fortunato ◽

Connor W. Coley ◽

Brian Barnes ◽

Klavs F. Jensen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Augmentation ◽

Machine Learning Algorithms ◽

Learning Models ◽

The Neural Network ◽

Computer Aided ◽

Synthesis Planning ◽

The One ◽

Machine Learning Models

This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access datasets of organic reactions with artificially calculated template applicability and pretraining a template relevance neural network on this augmented applicability dataset, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small dataset of well curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating these strategies can be very useful for small datasets.

Download Full-text

Machine Learning with ROOT/TMVA

EPJ Web of Conferences ◽

10.1051/epjconf/202024506019 ◽

2020 ◽

Vol 245 ◽

pp. 06019

Author(s):

Kim Albertsson ◽

Sitong An ◽

Sergei Gleyzer ◽

Lorenzo Moneta ◽

Joana Niermann ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Learning Tools ◽

Learning Models ◽

New Developments ◽

Training Time ◽

Future Developments ◽

Machine Learning Models

ROOT provides, through TMVA, machine learning tools for data analysis at HEP experiments and beyond. We present recently included features in TMVA and the strategy for future developments in the diversified machine learning landscape. Focus is put on fast machine learning inference, which enables analysts to deploy their machine learning models rapidly on large scale datasets. The new developments are paired with newly designed C++ and Python interfaces supporting modern C++ paradigms and full interoperability in the Python ecosystem. We present as well a new deep learning implementation for convolutional neural network using the cuDNN library for GPU. We show benchmarking results in term of training time and inference time, when comparing with other machine learning libraries such as Keras/Tensorflow.

Download Full-text

FEASIBILITY OF USING GROUP METHOD OF DATA HANDLING (GMDH) APPROACH FOR HORIZONTAL COORDINATE TRANSFORMATION

Geodesy and Cartography ◽

10.3846/gac.2020.10486 ◽

2020 ◽

Vol 46 (2) ◽

pp. 55-66

Author(s):

Bernard Kumi-Boateng ◽

Yao Yevenyo Ziggah

Keyword(s):

Neural Network ◽

Machine Learning ◽

Coordinate Transformation ◽

Machine Learning Algorithms ◽

Group Method ◽

Data Handling ◽

Learning Models ◽

Data Set ◽

Functional Relationships ◽

Machine Learning Models

Machine learning algorithms have emerged as a new paradigm shift in geoscience computations and applications. The present study aims to assess the suitability of Group Method of Data Handling (GMDH) in coordinate transformation. The data used for the coordinate transformation constitute the Ghana national triangulation network which is based on the two-horizontal geodetic datums (Accra 1929 and Leigon 1977) utilised for geospatial applications in Ghana. The GMDH result was compared with other standard methods such as Backpropagation Neural Network (BPNN), Radial Basis Function Neural Network (RBFNN), 2D conformal, and 2D affine. It was observed that the proposed GMDH approach is very efficient in transforming coordinates from the Leigon 1977 datum to the official mapping datum of Ghana, i.e. Accra 1929 datum. It was also found that GMDH could produce comparable and satisfactory results just like the widely used BPNN and RBFNN. However, the classical transformation methods (2D affine and 2D conformal) performed poorly when compared with the machine learning models (GMDH, BPNN and RBFNN). The computational strength of the machine learning models’ is attributed to its self-adaptive capability to detect patterns in data set without considering the existence of functional relationships between the input and output variables. To this end, the proposed GMDH model could be used as a supplementary computational tool to the existing transformation procedures used in the Ghana geodetic reference network.

Download Full-text

An atlas of robust microbiome associations with phenotypic traits based on large-scale cohorts from two continents

10.1101/2020.05.28.122325 ◽

2020 ◽

Author(s):

Daphna Rothschild ◽

Sigal Leviatan ◽

Ariel Hanemann ◽

Yossi Cohen ◽

Omer Weissbrod ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Prediction Models ◽

High Accuracy ◽

Phenotypic Traits ◽

Learning Models ◽

Microbial Genomes ◽

Microbiome Data ◽

The U.S ◽

Machine Learning Models

SummaryNumerous human conditions are associated with the microbiome, yet studies are inconsistent as to the magnitude of the associations and the bacteria involved, likely reflecting insufficiently employed sample sizes. Here, we collected diverse phenotypes and gut microbiota from 34,057 individuals from Israel and the U.S.. Analyzing these data using a much-expanded microbial genomes set, we derive an atlas of robust and numerous unreported associations between bacteria and physiological human traits, which we show to replicate in cohorts from both continents. Using machine learning models trained on microbiome data, we predict human traits with high accuracy across continents. Subsampling our cohort to smaller cohort sizes yielded highly variable models and thus sensitivity to the selected cohort, underscoring the utility of large cohorts and possibly explaining the source of discrepancies across studies. Finally, many of our prediction models saturate at these numbers of individuals, suggesting that similar analyses on larger cohorts may not further improve these predictions.

Download Full-text

Training machine learning models on climate model output yields skillful interpretable seasonal precipitation forecasts

Communications Earth & Environment ◽

10.1038/s43247-021-00225-4 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Peter B. Gibson ◽

William E. Chapman ◽

Alphan Altinok ◽

Luca Delle Monache ◽

Michael J. DeFlorio ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Climate Model ◽

Learning Approaches ◽

Model Ensemble ◽

Learning Models ◽

Seasonal Forecasts ◽

The North ◽

Climate Model Simulations ◽

Machine Learning Models

AbstractA barrier to utilizing machine learning in seasonal forecasting applications is the limited sample size of observational data for model training. To circumvent this issue, here we explore the feasibility of training various machine learning approaches on a large climate model ensemble, providing a long training set with physically consistent model realizations. After training on thousands of seasons of climate model simulations, the machine learning models are tested for producing seasonal forecasts across the historical observational period (1980-2020). For forecasting large-scale spatial patterns of precipitation across the western United States, here we show that these machine learning-based models are capable of competing with or outperforming existing dynamical models from the North American Multi Model Ensemble. We further show that this approach need not be considered a ‘black box’ by utilizing machine learning interpretability methods to identify the relevant physical processes that lead to prediction skill.

Download Full-text

Comparison of Multivariable Logistic Regression and Machine Learning Models for Predicting Bronchopulmonary Dysplasia or Death in Very Preterm Infants

Frontiers in Pediatrics ◽

10.3389/fped.2021.759776 ◽

2021 ◽

Vol 9 ◽

Author(s):

Faiza Khurshid ◽

Helen Coo ◽

Amal Khalil ◽

Jonathan Messiha ◽

Joseph Y. Ting ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Bronchopulmonary Dysplasia ◽

Prediction Models ◽

Neural Network Ensemble ◽

Learning Models ◽

K Nearest Neighbor ◽

Accurate Identification ◽

Machine Learning Models

Bronchopulmonary dysplasia (BPD) is the most prevalent and clinically significant complication of prematurity. Accurate identification of at-risk infants would enable ongoing intervention to improve outcomes. Although postnatal exposures are known to affect an infant's likelihood of developing BPD, most existing BPD prediction models do not allow risk to be evaluated at different time points, and/or are not suitable for use in ethno-diverse populations. A comprehensive approach to developing clinical prediction models avoids assumptions as to which method will yield the optimal results by testing multiple algorithms/models. We compared the performance of machine learning and logistic regression models in predicting BPD/death. Our main cohort included infants <33 weeks' gestational age (GA) admitted to a Canadian Neonatal Network site from 2016 to 2018 (n = 9,006) with all analyses repeated for the <29 weeks' GA subcohort (n = 4,246). Models were developed to predict, on days 1, 7, and 14 of admission to neonatal intensive care, the composite outcome of BPD/death prior to discharge. Ten-fold cross-validation and a 20% hold-out sample were used to measure area under the curve (AUC). Calibration intercepts and slopes were estimated by regressing the outcome on the log-odds of the predicted probabilities. The model AUCs ranged from 0.811 to 0.886. Model discrimination was lower in the <29 weeks' GA subcohort (AUCs 0.699–0.790). Several machine learning models had a suboptimal calibration intercept and/or slope (k-nearest neighbor, random forest, artificial neural network, stacking neural network ensemble). The top-performing algorithms will be used to develop multinomial models and an online risk estimator for predicting BPD severity and death that does not require information on ethnicity.

Download Full-text

Performance of Statistical and Machine Learning-Based Methods for Predicting Biogeographical Patterns of Fungal Productivity in Forest Ecosystems

10.21203/rs.3.rs-122045/v1 ◽

2020 ◽

Author(s):

Albert Morera ◽

Juan Martínez de Aragón ◽

José Antonio Bonet ◽

Jingjing Liang ◽

Sergio de-Miguel

Keyword(s):

Machine Learning ◽

Random Forest ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Modelling Approaches

Abstract BackgroundThe prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modelling tools. This study compares different statistical and machine learning models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modelling approaches to provide accurate and ecologically-consistent predictions.MethodsWe evaluated and compared the performance of two statistical modelling techniques, namely, generalized linear mixed models and geographically weighted regression, and four machine learning models, namely, random forest, extreme gradient boosting, support vector machine and deep learning to predict fungal productivity. We used a systematic methodology based on substitution, random, spatial and climatic blocking combined with principal component analysis, together with an evaluation of the ecological consistency of spatially-explicit model predictions.ResultsFungal productivity predictions were sensitive to the modelling approach and complexity. Moreover, the importance assigned to different predictors varied between machine learning modelling approaches. Decision tree-based models increased prediction accuracy by ~7% compared to other machine learning approaches and by more than 25% compared to statistical ones, and resulted in higher ecological consistence at the landscape level.ConclusionsWhereas a large number of predictors are often used in machine learning algorithms, in this study we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. When dealing with spatial-temporal data in the analysis of biogeographical patterns, climatic blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales. Random forest was the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modelling data.

Download Full-text

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

10.26434/chemrxiv.11811564 ◽

2020 ◽

Author(s):

Michael Fortunato ◽

Connor W. Coley ◽

Brian Barnes ◽

Klavs F. Jensen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Augmentation ◽

Machine Learning Algorithms ◽

Learning Models ◽

The Neural Network ◽

Computer Aided ◽

Synthesis Planning ◽

The One ◽

Machine Learning Models

Download Full-text

Machine Learning Based Algorithms to Impute PaO 2 from SpO2 Values and Development of an Online Calculator

10.21203/rs.3.rs-1053360/v1 ◽

2021 ◽

Author(s):

Shuangxia Ren ◽

Jill A. Zupetic ◽

Mohammadreza Tabary ◽

Rebecca DeSensi ◽

Mehdi Nouraie ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Clinical Variable ◽

Learning Models ◽

Icu Patients ◽

Non Linear ◽

Online Calculator ◽

Machine Learning Models

Abstract We created an online calculator using machine learning algorithms to impute the partial pressure of oxygen (PaO2)/fraction of delivered oxygen (FiO2) ratio using the non-invasive peripheral saturation of oxygen (SpO2) and compared the accuracy of the machine learning models we developed to previously published equations. We generated three machine learning algorithms (neural network, regression, and kernel-based methods) using 7 clinical variable features (N=9,900 ICU events) and subsequently 3 features (N=20,198 ICU events) as input into the models. Data from mechanically ventilated ICU patients were obtained from the publicly available Medical Information Mart for Intensive Care (MIMIC III) database and used for analysis. Compared to seven features, three features (SpO2, FiO2 and PEEP) were sufficient to impute PaO2 from the SpO2. Any of the tested machine learning models enabled imputation of PaO2 from the SpO2 with lower error and showed greater accuracy in predicting PaO2/FiO2 < 150 compared to the previously published log-linear and non-linear equations. Imputation using data from an independent validation cohort of ICU patients (N = 133) from 2 hospitals within the University of Pittsburgh Medical Center (UPMC) showed greater accuracy with the neural network and kernel-based machine learning models compared to the previously published non-linear equation.

Download Full-text